Tuesday, August 19, 2014

Availability Zones in Juju


You would be forgiven for thinking that I'd fallen off the face of the earth, considering how long it has been since I last wrote. I've been busy with my day job, moving into a new house; life in general. Work on llgo has been progressing, mostly due to Peter Collingbourne. I'll have more to say about llgo's progress in future posts.

This post is about some of the work I've done on Juju recently. Well, semi-recently; this post has been sitting in my drafts for a little while, waiting for the new 1.20.5 release to be announced.




Availability Zones in Juju

One of the major focuses of the Juju 1.20 release has been around high availability (HA). There are two sides to this: high availability of Juju itself, and high availability of your deployed services. We’ll leave the “Juju itself” side for another day, and talk about HA charms/services.

Until now, if you deployed a service via a charm with Juju, your cloud instance containing the service unit would be allocated wherever the cloud provider decided best. Most cloud providers split their compute services up into geographic regions (“us-east-1” in Amazon EC2, “US West” in Microsoft Azure, etc.). Some providers also break those regions down into “availability zones” (though the actual term may vary between providers, we use the term availability zone to describe the concept). An availability zone is essentially an isolated subset of a region.

If you’re developing an application that demands high availability, then you probably want to make sure your application is spread across availability zones. Some providers will guarantee a service level agreement (SLA) if you do this, such as on Microsoft Azure. Provided that you allocate at least two VMs to a “Cloud Service” on Azure, then you’re guaranteed a 99.95% uptime under the SLA and you get reimbursed if the guarantee isn’t met.

In Juju 1.20, there are two options for distributing your service units across availability zones: explicit (akin to machine placement) and automatic. So far we have enabled explicit availability zone placement in the Amazon EC2 and OpenStack (Havana onwards) providers, with support for the MAAS provider on the horizon. To add a new machine to a specific availability zone, use the “zone=” placement directive as below:
juju add-machine zone=us-east-1b

As well as support for explicit zone placement, we’ve implemented automatic spreading of services units across availability zones for Amazon EC2, OpenStack and Microsoft Azure. When cloud instances are provisioned, they will be allocated to an availability zone according to the density of the availability zone population for related instances. Two cloud instances are considered related if they both contain units of a common service, or if they are both Juju state servers.

To illustrate automatic spread, consider the mongodb charm. You’re going to use MongoDB as the datastore for your application, and you want to make sure the datastore is highly available; to do that, you’ll want to create a MongoDB replica set. It’s trivial to do this with the mongodb Juju charm:
juju deploy -n 3 mongodb

Wait a little while and you’ll have a 3-node MongoDB replica set. If a node happens to disappear, then the replica set will rejig itself so that there is a master (if the master was in fact lost) and everything should continue to work. If all the nodes go away, then you’re in trouble. This is where you want to go a step further and ensure your nodes are distributed across availability zones for greater resilience to failure. As of Juju 1.20, that “juju deploy” you just did handles that all for you: your 3 nodes will be uniformly spread across availability zones in the environment. If you add units to the service, they will also be spread across the zones according to how many other units of the service are in the zones. Let’s see what Juju did…
$ juju status mongodb | grep instance-id
instance-id: i-7a6d2b50
instance-id: i-ff1562d4
instance-id: i-627f0a30
$ ec2-describe-availability-zones
AVAILABILITYZONE us-east-1a available us-east-1
AVAILABILITYZONE us-east-1b available us-east-1
AVAILABILITYZONE us-east-1d available us-east-1
$ ec2-describe-instance-status i-7a6d2b50 i-ff1562d4 i-627f0a30 | grep i-
INSTANCE i-627f0a30 us-east-1d running 16 ok ok active
INSTANCE i-ff1562d4 us-east-1a running 16 ok ok active
INSTANCE i-7a6d2b50 us-east-1b running 16 ok ok active
(Note: the ec2-* commands are available in the ec2-api-tools package.)

Juju has distributed the mongodb units so that there is one in each zone, so if one zone is impaired the others will be unaffected. If we add a unit, it will go into one of the zones with the fewest mongodb units.

Explicit placement is currently only supported by Juju’s Amazon EC2 and OpenStack providers, but automatic spread is also supported by the Microsoft Azure provider. Due to the way that Microsoft Azure ties together availability zones and load balancing, it is currently necessary to forego “density” (i.e. explicit machine placement) in order to support automatic spread. If you are upgrading an existing environment to 1.20, then automatic spread will not be enabled. Newly created environments enable spread (and disable placement) by default, with an option to disable (availability-sets-enabled=false in environments.yaml).

Enjoy.