With Amazon’s announcement that SSD is now available for EBS volumes, they have also declared this the recommended EBS volume type.
The good folks at Canonical are now building Ubuntu AMIs with EBS-SSD boot volumes. In my preliminary tests, running EBS-SSD boot AMIs instead of EBS magnetic boot AMIs speeds up the instance boot time by approximately… a lot.
Canonical now publishes a wide variety of Ubuntu AMIs including:
- 64-bit, 32-bit
- EBS-SSD, EBS-SSD pIOPS, EBS-magnetic, instance-store
- PV, HVM
- in every EC2 region
- for every active Ubuntu release
Matrix that out for reasonable combinations and you get 561 AMIs actively supported today.
On the Alestic.com blog, I provide a handy reference to the much smaller set of Ubuntu AMIs that match my generally recommended configurations for most popular uses, specifically:
I list AMIs for both PV and HVM, because different virtualization technologies are required for different EC2 instance types.
Where SSD is not available, I list the magnetic EBS boot AMI (e.g., Ubuntu 10.04 Lucid).
To access this list of recommended AMIs, select an EC2 region in the pulldown menu towards the top right of any page on Alestic.com.
If you like using the AWS console to launch instances, click on the orange launch button to the right of the AMI id.
The AMI ids are automatically updated using an API provided by Canonical, so you always get the freshest released images.
Original article: http://alestic.com/2014/06/ec2-ebs-ssd-ami
stress-ng current contains the following methods to exercise the machine:
- CPU compute - just lots of sqrt() operations on pseudo-random values. One can also specify the % loading of the CPUs
- Cache thrashing, a naive cache read/write exerciser
- Drive stress by writing and removing many temporary files
- Process creation and termination, just lots of fork() + exit() calls
- I/O syncs, just forcing lots of sync() calls
- VM stress via mmap(), memory write and munmap()
- Pipe I/O, large pipe writes and reads that exercise pipe, copying and context switching
- Socket stressing, much like the pipe I/O test but using sockets
- Context switching between a pair of producer and consumer processes
The --metrics option dumps the number of operations performed by each stress method, aka "bogo ops", bogos because they are a rough and unscientific metric. One can specify how long to run a test either by test duration in sections or by bogo ops.
I've tried to make stress-ng compatible with the older stress tool, but note that it is not guaranteed to produce identical results as the common test methods between the two tools have been implemented differently.
Stress-ng has been a useful for helping me measure different power consuming loads. It is also useful with various thermald optimisation tweaks on one of my older machines.
For more information, consult the stress-ng manual page. Be warned, this tool can make your system get seriously busy and warm!
Today, I was checking some charming as usual, and found myself in a problem. I wanted to have different environments for automated testing and manual code testing, but I only had one AWS account. I thought I needed an account in another cloud, or another AWS account, but after thinking for a while I decided it wasn’t worth it, leaving those thoughts in the past. But suddenly I asked myself if it was possible to just clone my information on my environments.yaml file and set up another environment with the same credentials. Indeed, it was.
The only thing I did here was:
- Open my environments.yaml file.
- Copy the exact same information I had for my old EC2 environment.
- Give a new name to the environment I was creating.
- Change the name of the storage bucket (as it has to be unique).
- Save the changes, close the file, and bootstrap the new environment.
Easy enough, right? That way you can just have multiple environments and execute different things on each one with just one account. I am not sure how this will work for other providers, but at least for AWS it works this way. This just adds more awesome-ness to Juju than it already has. Now, let’s play with this environments!
Ubuntu announced its 13.10 (Saucy Salamander) release almost 9 months ago, on October 17, 2013. This was the second release with our new 9 month support cycle and, as such, the support period is now nearing its end and Ubuntu 13.10 will reach end of life on Thursday, July 17th. At that time, Ubuntu Security Notices will no longer include information or updated packages for Ubuntu 13.10.
The supported upgrade path from Ubuntu 13.10 is via Ubuntu 14.04 LTS. Instructions and caveats for the upgrade may be found at:
Ubuntu 14.04 LTS continues to be actively supported with security updates and select high-impact bug fixes. Announcements of security updates for Ubuntu releases are sent to the ubuntu-security-announce mailing list, information about which may be found at:
Since its launch in October 2004 Ubuntu has become one of the most highly regarded Linux distributions with millions of users in homes, schools, businesses and governments around the world. Ubuntu is Open Source software, costs nothing to download, and users are free to customise or alter their software in order to meet their needs.
Originally posted to the ubuntu-announce mailing list on Fri Jun 20 05:00:13 UTC 2014 by Adam Conrad on behalf of the Ubuntu Release Team
A lot of people have been asking lately about what the minimum number of nodes are required to setup OpenStack and there seems to be a lot of buzz around setting up OpenStack with Juju and MAAS. Some would speculate it has something to do with the amazing keynote presentation by Mark Shuttleworth, others would conceed it’s just because charms are so damn cool. Whatever the reason my answer is as follows
You really want 12 nodes to do OpenStack right, even more for high availability, but at a bare minimum you only need two nodes.
So, naturally, as more people dive in to OpenStack and evaluate how they can use it in their organizations, they jump at the thought “Oh, I have two servers laying around!” and immediately want to know how to achieve such a feat with Juju and MAAS. So, I took an evening to do such a thing with my small cluster and share the process.
This post makes a few assumptions. First, that you have already set up MAAS, installed Juju, and configured Juju to speak to your MAAS environment. Secondly, that the two machine allotment is nodes after setting up MAAS and that these two nodes are already enlisted in MAAS.My setup
Before I dive much deeper, let me briefly show my setup.
I realize the photo is terrible, the Nexus 4 just doesn’t have a super stellar camera compared to other phones on the market. For the purposes of this demo I’m using my home MAAS cluster which consists of three Intel NUCs, a gigabit switch, a switched PDU, and an old Dell Optiplex with an extra nick which acts as the MAAS region controller. All the NUCs have been enlisted in MAAS and commissioned already.Diving in
Once MAAS and Juju are configured you can go ahead and run juju bootstrap. This will provision one of the MAAS nodes and use it as the orchestration node for your juju environment. This can take some time, especially if you don’t have fastpath installer selected, if you get a timeout during your first bootstrap don’t fret! You can increase the bootstrap timeout in the environments.yaml file with the following directive in your maas definition: bootstrap-timeout: 900. During the video I increase this timeout to 900 seconds in the hopes of eliminating this issue.
After you’ve bootstrapped it’s time to get deploying! If you care to use the Juju GUI now would be the time to deploy it. You can do so with by running the following command:juju deploy --to 0 juju-gui
To avoid having juju spin us up another machine we can tell Juju to simply place it on machine 0.
NOTE: the --to flag is crazy dangerous. Not all services can be safely co-located with each other. This is tandumount to “hulk smashing” services and will likely break things. Juju GUI is designed to coincide with the bootstrap node so this has been safe. Running this elsewhere will likely result in bad things. You have been warned.
Now it’s time to get OpenStack going! Run the following commands:juju deploy --to lxc:0 mysql juju deploy --to lxc:0 keystone juju deploy --to lxc:0 nova-cloud-controller juju deploy --to lxc:0 glance juju deploy --to lxc:0 rabbitmq-server juju deploy --to lxc:0 openstack-dashboard juju deploy --to lxc:0 cinder
To break this down, what you’re doing is deploying the minimum number of components required to support OpenStack, only, your deploying them to machine 0 (the bootstrap node) in LXC containers. If you don’t know what LXC containers are, they are very light weight Linux containers (virtual machines) that don’t produce a lot of overhead but allow you to safely compartmentalize these services. So, after a few minutes these machines will begin to pop online, but in the meantime we can press on because Juju waits for nothing!
The next step is to deploy the nova-compute node. This is the powerhouse behind OpenStack and is the hypervisor for launching instances. As such, we don’t really want to virtualize it as KVM (or XEN, etc) don’t work well inside of LXC machines.juju deploy nova-compute
That’s it. MAAS will allocate the second, and final node if you only have two, to nova-compute. Now while all these machines are popping up and becoming ready lets create relations. The magic of Juju and what it can do is in creating relations between services. It’s what turns a bunch of scripts into LEGOs for the cloud. You’ll need to run the following commands to create all the relations necessary for the OpenStack components to talk to eachother:juju add-relation mysql keystone juju add-relation nova-cloud-controller mysql juju add-relation nova-cloud-controller rabbitmq-server juju add-relation nova-cloud-controller glance juju add-relation nova-cloud-controller keystone juju add-relation nova-compute nova-cloud-controller juju add-relation nova-compute mysql juju add-relation nova-compute rabbitmq-server:amqp juju add-relation nova-compute glance juju add-relation glance mysql juju add-relation glance keystone juju add-relation glance cinder juju add-relation mysql cinder juju add-relation cinder rabbitmq-server juju add-relation cinder nova-cloud-controller juju add-relation cinder keystone juju add-relation openstack-dashboard keystone
Whew, I know that’s a lot to go through, but OpenStack isn’t a walk in the park. It’s a pretty intricate system with lots of dependencies. The good news is we’re nearly done! No doubt most of the nodes have turned green in the GUI or are marked as “started” in the output of juju status.
One of the last things is configuration for the cloud. Since this is all working against Trusty, we have the latest OpenStack being installed. All that’s left is to configure our admin password in keystone so we can log in to the dashboard.juju set keystone admin-password="helloworld"
Set the password to whatever you’d like. Once complete, run juju status openstack-dashboard find the public-address for that unit, load it’s address in your browser and navigate to /horizon. (For example, if the public-address was 10.0.1.2 you would go to http://10.0.1.2/horizon). Log in with the username admin and the password as you set it in the command line. You should now be in the horizon dashboard for OpenStack. Click on Admin -> System Panel -> Hypervisors and confirm you have a hypervisor listed.
Congradulations! You’ve create a condensed OpenStack installation.
On top of the incredible response from the team to complete the handout, I received a handful of volunteers for CD distribution throughout Colorado. The volunteers below will be available with install CD's in the following Colorado cities:
- Neal McBurnett: Boulder
- Chris Yoder: Longmont
- Ryan Nicholson: Fort Collins
- Emma Marshall: Denver & Aurora
Here's a close look at our 2-sided handout:
Thank you to the Colorado Ubuntu Team for helping spread Ubuntu! We are on an excellent path to a successful summer!
The Randa meetings provide an excellent opportunity for KDE developers to come across for a week long hack session to fix bugs in various KDE components while collaborating on new features.
This year we have some amazing things planned, with contributors working across the board on delivering an amazing KDE Frameworks 5 experience, a KDE frameworks SDK, a KDE frameworks book, the usual bug fixing and writing new features for the KDE Multimedia stack and much much more.
So please, go ahead and donate to our Randa fundraiser here , because when these contributors come together, amazing things happen :)
Packages for the release of KDE SC 4.13.2 are available for Kubuntu 12.04LTS, 13.10 and our development release. You can get them from the Kubuntu Backports PPA.
To update, use the Software Repository Guide to add the following repository to your software sources list:
Packages for the release of KDE SC 4.13.2 are available for Kubuntu 12.04LTS, 13.10 and our development release. You can get them from the Kubuntu Backports PPA.
To update, use the Software Repository Guide to add the following repository to your software sources list:
With KDE Frameworks 5 and Plasma 5 not too far away our awesome Blue Systems build crew now increased the cadence at which we publish new Neon 5 Live ISO images for you to test. I am very happy to announce that from now on there will be a new ISO waiting for you every Friday at:
Neon 5 provides daily built packages of the latest and greatest KDE software based on KDE Frameworks 5 based on Kubuntu, so what you get every Friday is most certainly no older than 24 hours. This makes the ISOs a perfect way to test and report bugs in the upcoming release, as well as track the overall progress of things.
If you’d like continuous (albeit, less trouble-free) updates for an existing Kubuntu 14.04 or Neon 5 installation you can of course use the PPAs directly. Beware! There be dragons ;)
If you would like to support future KDE Development please check out the current fundraiser for the KDE Randa Meetings 2014.
Live in the US? Did you know that we put Lead (Pb), a known neurotoxin, in:
- Garden hoses (that have been shown to leak Lead into the water)
- Power cords (including laptop cords)
- Carseats (mostly to the base, some other toxins have been found in the seat itself)
- Likely more, it’s apparently not uncommon to be added to plastic…
In the EU you aren’t allowed to put Lead in the above. I think it’s time we joined them!
- Sign the petition on the White House We the People site.
- Donate to this Indiegogo campaign to test carseats for toxic chemicals. (They are only asking for $10,000! ~ mostly to buy the carseats)
- Share this post / the above with friends, family, and any celebrities you happen to know on Twitter, etc. #NoSafeAmountOfLead.
- Bonus: Watch episode 7 of the new Cosmos which ends with Neil deGrasse Tyson saying there is no safe amount of lead.
Please let me know if you have trouble doing any of the above..
It’s all about purpose. This is the most important thing to keep in mind when you’re attempting to compare or judge something on how useful something is.
What do you look to accomplish with a coupe sports car? Surely you don’t buy one and claim it sucks because you can’t fit your family of four. That isn’t the intended purpose.
There are a lot of people trying to replace their day-to-day machine with Chromebooks and expecting a cheap 1-to-1 replacement. Whelp, good luck, you may end up frustrated. Let’s take a second and scope your expectations properly, so you know what you’re getting into.
Here’s an example of something I’ve witnessed:
ChromeOS sucks, it doesn’t have $application.
This is one of the most prominent types of comments or articles on the internet. This argument is invalid, however, in a proper scoping of intended purpose of the device.
Frankly, if you’re trying to get one of these for very cheap to replace every experience you have on a Windows, Mac, or Linux computer, you’re going to have a bad time. You’re just not going to get that out of a Chromebook, but that’s okay. Or may be for you, depending on what you’re looking to do.
Are you looking for a cheap Facebook or Pinterest machine? Awesome. Some editing using Google Drive of documents and spreadsheets? Yep. Email? Check. Netflix? Aye, it can do that too. Google Hangouts? Like a champ. Remote access to a VDI environment with VMware Horizon View? Yes! More on that later…
There are a ton of addons and ‘webapps’ you can install from the Chrome Web Store, and alternatives.
It is built to be a web browser; it is a web browser. That’s it. And if that’s what you want, it is perfect. If you require something else, then it isn’t. And that is okay.
written and posted from a Chromebook.
Today, Svetlana Belkin (belkinsa), done work on the team wiki pages, mainly the home page. The home page now has a cleaner look where the basics, such as the introduction about the team all the way to how to contact the team and how to join the team. Svetlana also removed some of the excess “tabs” on the menu bar and added a “Site Map” tab, where users can see what other pages are there.
There is still work to be done on a homepage, mainly with menu and a lot of work on those team wiki pages, as stated here and in the UOS session. Hopefully, the team’s wiki pages will be finished by the end of July 2014 in order for a clearer understanding for newcomers.
Filed under: News Tagged: News, Svetlana Belkin, Team Wiki Pages, Ubuntu, Ubuntu Scientists, wiki
We’re back with Season Seven, Episode Twelve of the Ubuntu Podcast! Alan Pope, Mark Johnson, Tony Whitmore, and Laura Cowen are drinking tea and eating very rich chocolate cake (like this one, only more chocolatey) in Studio L.Download OGG Download MP3 Play in Popup
In this week’s show:
- We discuss alternatives to Ubuntu One, which has recently shut down. Alan makes up the CRESCCO scale…
- We also discuss:
- We share some Command Line Lurve from @climagic:
Use it to order file-systems by percent usage and keep the header in place. Or order by file-system size with -k2n
* And we read your feedback – thanks for sending it in!
We’ll be back next week, so please send your comments and suggestions to: email@example.com
Join us on IRC in #uupc on Freenode
Leave a voicemail via phone: +44 (0) 203 298 1600, sip: firstname.lastname@example.org and skype: ubuntuukpodcast
Follow us on Twitter
Find our Facebook Fan Page
Follow us on Google+
The press picked up the recent press release about Debian LTS but mainly to mention the fact that it’s up and running. The call for help is almost never mentioned.
As usual what we lack is contributors doing the required work, but in this specific case, there’s a simple solution: pay people to do the required work. This extended support is mainly for the benefit of corporate users and if they see value in Debian LTS, it should not be too difficult to convince companies to support the project.
With some other Debian developers, we have gone out of our way to make it super easy for companies to support the Debian LTS project. We have created a service offer for Debian-using companies.
Freexian (my company) collects money from all contributing companies (by way of invoices) and then uses the money collected to pay Debian contributors who will prepare security updates. On top of this we added some concrete benefits for contributing companies such as the possibility to indicate which packages should have priority, or even the possibility to provide functional tests to ensure that a security update doesn’t introduce a regression in their production setup.
To make a good job of maintaining Debian Squeeze, our goal is to fund the equivalent of a full-time position. We’re currently far from there with only 13 hours per month funded by 4 companies. That makes a current average of 3.25 hours/month funded by each contributing company, for a price of 276 EUR/month or 3315 EUR/year.
This is not much if you compare it with the price those companies would have to pay to upgrade all their Debian 6 machines now instead of keeping them for two supplementary years.
Assuming the average contribution level will stay the same, we only need the support of 50 other companies in the world. That’s really not much compared to the thousands of companies using Debian. Can you convince your own company? Grab the subscription form and have a chat with your company management.
Help us reach that goal, share this article and the link to Freexian’s Debian LTS offer. Long Term Support is important if we want Debian to be a good choice for servers and big deployments. We need to make Squeeze LTS a success!
We’re now almost half way through the year and only a few days until summer officially starts here in the UK!
In the last few weeks we’ve worked on:
- Responsive ubuntu.com: we’ve finished publishing the series on making ubuntu.com responsive on the design blog
- Ubuntu.com: we’ve released a hub for our legal documents and information, and we’ve created homepage takeovers for Mobile Asia Expo
- Juju GUI: we’ve planned work for the next cycle, sketched scenarios based on the new personas, and launched the new inspector on the left
- Fenchurch: we’ve finished version 1 of our new asset server, and we’ve started work on the new Ubuntu partners site
- Ubuntu Insights: we’ve published the latest iteration of Ubuntu Insights, now with a dedicated press area
- Chinese website: we’ve released the Chinese version of ubuntu.com
And we’re currently working on:
- Responsive Day Out: I’m speaking at the Responsive Day Out conference in Brighton on the 27th on how we made ubuntu.com responsive
- Responsive ubuntu.com: we’re working on the final tweaks and improvements to our code and documentation so that we can release to the public in the next few weeks
- Juju GUI: we’re now starting to design based on the scenarios we’ve created
- Fenchurch: we’re now working on Juju charms for the Chinese site asset server and Ubuntu partners website
- Partners: we’re finishing the build of the new Ubuntu partners site
- Chinese website: we’ll be adding a cloud and server sections to the site
- Cloud Installer: we’re working on the content for the upcoming Cloud Installer beta pages
If you’d like to join the web team, we are currently looking for a web designer and a front end developer to join the team!
Working on Juju personas and scenarios.
Have you got any questions or suggestions for us? Would you like to hear about any of these projects and tasks in more detail? Let us know your thoughts in the comments.
This shows me running the Unity 8 preview session. Simple Scan shows up as an option and can be launched and perform a scan.
This is only a first start, and there's still lots of work to be done. In particular:
- Applications need to set X-Ubuntu-Touch=true in their .desktop files to show in Unity 8.
- Application icons from the gnome theme do not show (bug).
- GTK+ applications don't go fullscreen (bug).
- No cursors changes (bug).
- We only support single window applications because we can't place/focus the subwindows yet (bug). We're currently faking menus and tooltips by drawing them onto the same surface.
If you are using Ubuntu 14.10 you can install the packages for this from a PPA:
$ sudo apt-add-repository ppa:ubuntu-desktop/gtk-mir
$ sudo apt-get update
$ sudo apt-get upgrade
The PPA contains a version of GTK+ with Mir support, fixes for libraries that assume you are running in X and a few select applications patched so they show in Unity 8.
The Mir backend currently on the wip/mir branch in the GTK+ git repository. We will keep developing it there until it is complete enough to propose into GTK+ master. We have updated jhbuild to support Mir so we can easily build and test this backend going forward.
This post is part of the series ‘Making ubuntu.com responsive‘.
There are several resources out there on how to create responsive websites, but they tend to go through the process in an ideal scenario, where the project starts with a blank slate, from scratch.
That’s why we thought it would be nice to share the steps we took in converting our main website and framework, ubuntu.com, into a fully responsive site, with all the constraints that come from working on legacy code, with very little time available, while making sure that the site was kept up-to-date and responding to the needs to the business.
Before we started this process, the idea of converting ubuntu.com seemed like a herculean task. It was only because we divided the work in several stages, smaller projects, tightened scope, and kept things simple, that it was possible to do it.
We learned a lot throughout this past year or so, and there is a lot more we want to do. We’d love to hear about your experience of similar projects, suggestions on how we can improve, tools we should look into, books we should read, articles we should bookmark, and things we should try, so please do leave us your thoughts in the comments section.
Here is the list of all the post in the series:
- Setting the rules
- Making the rules a reality
- Pilot projects
- Lessons learned
- Scoping the work
- Approach to content
- Making our grid responsive
- Adapting our navigation to small screens
- Dealing with responsive images
- Updating font sizes and increasing readability
- Our Sass architecture
- Ensuring performance
- Testing on multiple devices
Note: I will be speaking about making ubuntu.com responsive at the Responsive Day Out conference in Brighton, on the 27th June. See you there!
AMD made available 10 SeaMicro 15000 chassis in one of their test labs. Each chassis has 64, 4 core, 2 thread (8 logical cores), 32GB RAM servers with 500G storage attached via a storage fabric controller – creating the potential to scale an OpenStack deployment to a large number of compute nodes in a small rack footprint.
As you would expect, we chose the best tools for deploying OpenStack:
- MAAS – Metal-as-a-Service, providing commissioning and provisioning of servers.
- Juju – The service orchestration for Ubuntu, which we use to deploy OpenStack on Ubuntu using the OpenStack charms.
- OpenStack Icehouse on Ubuntu 14.04 LTS.
- CirrOS – a small footprint linux based Cloud OS
MAAS has native support for enlisting a full SeaMicro 15k chassis in a single command – all you have to do is provide it with the MAC address of the chassis controller and a username and password. A few minutes later, all servers in the chassis will be enlisted into MAAS ready for commissioning and deployment:maas local node-group probe-and-enlist-hardware \ nodegroup model=seamicro15k mac=00:21:53:13:0e:80 \ username=admin password=password power_control=restapi2
Juju has been the Ubuntu Server teams preferred method for deploying OpenStack on Ubuntu for as long as I can remember; Juju uses Charms to encapsulate the knowledge of how to deploy each part of OpenStack (a service) and how each service relates to each other – an example would include how Glance relates to MySQL for database storage, Keystone for authentication and authorization and (optionally) Ceph for actual image storage.
Using the charms and Juju, it’s possible to deploy complex OpenStack topologies using bundles, a yaml format for describing how to deploy a set of charms in a given configuration – take a look at the OpenStack bundle we used for this test to get a feel for how this works.
Starting out small(ish)
All ten chassis were not all available from the outset of testing, so we started off with two chassis of servers to test and validate that everything was working as designed. With 128 physical servers, we were able to put together a Neutron based OpenStack deployment with the following services:
- 1 Juju bootstrap node (used by Juju to control the environment), Ganglia Master server
- 1 Cloud Controller server
- 1 MySQL database server
- 1 RabbitMQ messaging server
- 1 Keystone server
- 1 Glance server
- 3 Ceph storage servers
- 1 Neutron Gateway network forwarding server
- 118 Compute servers
We described this deployment using a Juju bundle, and used the juju-deployer tool to bootstrap and deploy the bundle to the MAAS environment controlling the two chassis. Total deployment time for the two chassis to the point of a OpenStack cloud that was usable was around 35 minutes.
At this point we created 500 tenants in the cloud, each with its own private network (using Neutron), connected to the outside world via a shared public network. The immediate impact of doing this is that Neutron creates dnsmasq instances, Open vSwitch ports and associated network namespaces on the Neutron Gateway data forwarding server – seeing this many instances of dnsmasq on a single server is impressive – and the server dealt with the load just fine!
Next we started creating instances; we looked at using Rally for this test, but it does not currently support using Neutron for instance creation testing, so we went with a simple shell script that created batches of servers (we used a batch size of 100 instances) and then waited for them to reach the ACTIVE state. We used the CirrOS cloud image (developed and maintained by the Ubuntu Server teams’ very own Scott Moser) with a custom Nova flavor with only 64 MB of RAM.
We immediately hit our first bottleneck – by default, the Nova daemons on the Cloud Controller server will spawn sub-processes equivalent to the number of cores that the server has. Neutron does not do this and we started seeing timeouts on the Nova Compute nodes waiting for VIF creation to complete. Fortunately Neutron in Icehouse has the ability to configure worker threads, so we updated the nova-cloud-controller charm to set this configuration to a sensible default, and provide users of the charm with a configuration option to tweak this setting. By default, Neutron is configured to match what Nova does, 1 process per core – using the charm configuration this can be scaled up using a simple multiplier – we went for 10 on the Cloud Controller node (80 neutron-server processes, 80 nova-api processes, 80 nova-conductor processes). This allowed us to resolve the VIF creation timeout issue we hit in Nova.
At around 170 instances per compute server, we hit our next bottleneck; the Neutron agent status on compute nodes started to flap, with agents being marked down as instances were being created. After some investigation, it turned out that the time required to parse and then update the iptables firewall rules at this instance density took longer than the default agent timeout – hence why agents kept dropping out from Neutrons perspective. This resulted in virtual interface (VIF) creation timing out and we started to see instance activation failures when trying to create more that a few instances in parallel. Without an immediate fix for this issue (see bug 1314189), we took the decision to turn Neutron security groups off in the deployment and run without any VIF level iptables security. This was applied using the nova-compute charm we were using, but is obviously not something that will make it back into the official charm in the Juju charm store.
With the workaround on the Compute servers and we were able to create 27,000 instances on the 118 compute nodes. The API call times to create instances from the testing endpoint remained pretty stable during this test, however as the Nova Compute servers got heavily loaded, the amount of time taken for all instances to reach the ACTIVE state did increase:
At this point AMD had another two chassis racked and ready for use so we tore down the existing two chassis, updated the bundle to target compute services at the two new chassis and re-deployed the environment. With a total of 256 servers being provisioned in parallel, the servers were up and running within about 60 minutes, however we hit our first bottleneck in Juju.
The OpenStack charm bundle we use has a) quite a few services and b) a-lot of relations between services – Juju was able to deploy the initial services just fine, however when the relations where added, the load on the Juju bootstrap node went very high and the Juju state service on this node started to throw a larger number of errors and became unresponsive – this has been reported back to the Juju core development team (see bug 1318366).
We worked around this bottleneck by bringing up the original two chassis in full, and then adding each new chassis in series to avoid overloading the Juju state server in the same way. This obviously takes longer (about 35 minutes per chassis) but did allow us to deploy a larger cloud with an extra 128 compute nodes, bringing the total number of compute nodes to 246 (118+128).
And then we hit our next bottleneck…
By default, the RabbitMQ packaging in Ubuntu does not explicitly set a file descriptor ulimit so it picks up the Ubuntu defaults – which are 1024 (soft) and 4096 (hard). With 256 servers in the deployment, RabbitMQ hits this limit on concurrent connections and stops accepting new ones. Fortunately it’s possible to raise this limit in /etc/default/rabbitmq-server – and as we were deployed using the rabbitmq-server charm, we were able to update the charm to raise this limit to something sensible (64k) and push that change into the running environment. RabbitMQ restarted, problem solved.
With the 4 chassis in place, we were able to scale up to 55,000 instances.
Ganglia was letting us know that load on the Nova Cloud Controller during instance setup was extremely high (15-20), so we decided at this point to add another unit to this service:juju add-unit nova-cloud-controller
and within 15 minutes we had another Cloud Controller server up and running, automatically configured for load balancing of API requests with the existing server and sharing the load for RPC calls via RabbitMQ. Load dropped, instance setup time decreased, instance creation throughput increased, problem solved.
Whilst we were working through these issues and performing the instance creation, AMD had another two chassis (6 & 7) racked, so we brought them into the deployment adding another 128 compute nodes to the cloud bringing the total to 374.
And then things exploded…
The number of instances that can be created in parallel is driven by two factors – 1) the number of compute nodes and 2) the number of workers across the Nova Cloud Controller servers. However, with six chassis in place, we were not able to increase the parallel instance creation rate as much as we wanted to without getting connection resets between Neutron (on the Cloud Controllers) and the RabbitMQ broker.
The learning from this is that Neutron+Nova makes for an extremely noisy OpenStack deployment from a messaging perspective, and a single RabbitMQ server appeared to not be able to deal with this load. This resulted in a large number of instance creation failures so we stopped testing and had a re-think.
A change in direction
After the failure we saw in the existing deployment design, and with more chassis still being racked by our friends at AMD, we still wanted to see how far we could push things; however with Neutron in the design, we could not realistically get past 5-6 chassis of servers, so we took the decision to remove Neutron from the cloud design and run with just Nova networking.
Fortunately this is a simple change to make when deploying OpenStack using charms as the nova-cloud-controller charm has a single configuration option to allow Neutron and Nova networkings to be configured. After tearing down and re-provisioning the 6 chassis:juju destroy-enviroment maas juju-deployer --bootstrap -c seamicro.yaml -d trusty-icehouse
with the revised configuration, we were able to create instances in batches of 100 at a respectable throughput of initially 4.5/sec – although this did degrade as load on compute servers went higher. This allowed us to hit 75,000 running instances (with no failures) in 6hrs 33 mins, pushing through to 100,000 instances in 10hrs 49mins – again with no failures.
As we saw in the smaller test, the API invocation time was fairly constant throughout the test, with the total provisioning time through to ACTIVE state increasing due to loading on the compute nodes:
OK – so we are now running an OpenStack Cloud on Ubuntu 14.04 across 6 seamicro chassis (1,2,3,5,6,7 – 4 comes later) – a total of 384 servers (give or take one or two which would not provision). The cumulative load across the cloud at this point was pretty impressive – Ganglia does a pretty good job at charting this:
AMD had two more chassis (8 & 9) in the racks which we had enlisted and commissioned, so we pulled them into the deployment as well; This did take some time – Juju was grinding pretty badly at this point and just running ‘juju add-unit -n 63 nova-compute-b6′ was taking 30 minutes to complete (reported upstream – see bug 1317909).
After a couple of hours we had another ~128 servers in the deployment, so we pushed on and created some more instances through to the 150,000 mark – as the instances where landing on the servers on the 2 new chassis, the load on the individual servers did increase more rapidly so instance creation throughput did slow down faster but the cloud managed the load.
Prior to starting testing at any scale, we had some issues with one of the chassis (4) which AMD had resolved during testing, so we shoved that back into the cloud as well; after ensuring that the 64 extra servers where reporting correctly to Nova, we started creating instances again.
However, the instances kept scheduling onto the servers in the previous two chassis we added (8 & 9) with the new nodes not getting any instances. It turned out that the servers in chassis 8 & 9 where AMD based servers with twice the memory capacity; by default, Nova does not look at VCPU usage when making scheduling decisions, so as these 128 servers had more remaining memory capacity that the 64 new servers in chassis 4, they were still being targeted for instances.
Unfortunately I’d hopped onto the plane from Austin to Atlanta for a few hours so I did not notice this – and we hit our first 9 instance failures. The 128 servers in Chassis 8 and 9 ended up with nearly 400 instances each – severely over-committing on CPU resources.
A few tweaks to the scheduler configuration, specifically turning on the CoreFilter and setting the over commit at x 32, applied to the Cloud Controller nodes using the Juju charm, and instances started to land on the servers in chassis 4. This seems like a sane thing to do by default, so we will add this to the nova-cloud-controller charm with a configuration knob to allow the over commit to be altered.
At the end of the day we had 168,000 instances running on the cloud – this may have got some coverage during the OpenStack summit….
The last word
Having access to this many real servers allowed us to exercise OpenStack, Juju, MAAS and our reference Charm configurations in a way that we have not been able undertake before. Exercising infrastructure management tools and configurations at this scale really helps shake out the scale pinch points – in this test we specifically addressed:
- Worker thread configuration in the nova-cloud-controller charm
- Bumping open file descriptor ulimits in the rabbitmq-server charm enabled greater concurrent connections
- Tweaking the maximum number of mysql connections via charm configuration
- Ensuring that the CoreFilter is enabled to avoid potential extreme overcommit on nova-compute nodes.
There where a few things we could not address during the testing for which we had to find workarounds:
- Scaling a Neutron base cloud past more than 256 physical servers
- High instance density on nova-compute nodes with Neutron security groups enabled.
- High relation creation concurrency in the Juju state server causing failures and poor performance from the juju command line tool.
We have some changes in the pipeline to the nova-cloud-controller and nova-compute charms to make it easier to split Neutron services onto different underlying messaging and database services. This will allow the messaging load to be spread across different message brokers, which should allow us to scale a Neutron based OpenStack cloud to a much higher level than we achieved during this testing. We did find a number of other smaller niggles related to scalability – checkout the full list of reported bugs.
And finally some thanks:
- Blake Rouse for doing the enablement work for the SeaMicro chassis and getting us up and running at the start of the test.
- Ryan Harper for kicking off the initial bundle configuration development and testing approach (whilst I was taking a break- thanks!) and shaking out the initial kinks.
- Scott Moser for his enviable scripting skills which made managing so many servers a whole lot easier – MAAS has a great CLI – and for writing CirrOS.
- Michael Partridge and his team at AMD for getting so many servers racked and stacked in such a short period of time.
- All of the developers who contribute to OpenStack, MAAS and Juju!
.. you are all completely awesome!