Tuesday, August 19, 2014

Availability Zones in Juju


You would be forgiven for thinking that I'd fallen off the face of the earth, considering how long it has been since I last wrote. I've been busy with my day job, moving into a new house; life in general. Work on llgo has been progressing, mostly due to Peter Collingbourne. I'll have more to say about llgo's progress in future posts.

This post is about some of the work I've done on Juju recently. Well, semi-recently; this post has been sitting in my drafts for a little while, waiting for the new 1.20.5 release to be announced.




Availability Zones in Juju

One of the major focuses of the Juju 1.20 release has been around high availability (HA). There are two sides to this: high availability of Juju itself, and high availability of your deployed services. We’ll leave the “Juju itself” side for another day, and talk about HA charms/services.

Until now, if you deployed a service via a charm with Juju, your cloud instance containing the service unit would be allocated wherever the cloud provider decided best. Most cloud providers split their compute services up into geographic regions (“us-east-1” in Amazon EC2, “US West” in Microsoft Azure, etc.). Some providers also break those regions down into “availability zones” (though the actual term may vary between providers, we use the term availability zone to describe the concept). An availability zone is essentially an isolated subset of a region.

If you’re developing an application that demands high availability, then you probably want to make sure your application is spread across availability zones. Some providers will guarantee a service level agreement (SLA) if you do this, such as on Microsoft Azure. Provided that you allocate at least two VMs to a “Cloud Service” on Azure, then you’re guaranteed a 99.95% uptime under the SLA and you get reimbursed if the guarantee isn’t met.

In Juju 1.20, there are two options for distributing your service units across availability zones: explicit (akin to machine placement) and automatic. So far we have enabled explicit availability zone placement in the Amazon EC2 and OpenStack (Havana onwards) providers, with support for the MAAS provider on the horizon. To add a new machine to a specific availability zone, use the “zone=” placement directive as below:
juju add-machine zone=us-east-1b

As well as support for explicit zone placement, we’ve implemented automatic spreading of services units across availability zones for Amazon EC2, OpenStack and Microsoft Azure. When cloud instances are provisioned, they will be allocated to an availability zone according to the density of the availability zone population for related instances. Two cloud instances are considered related if they both contain units of a common service, or if they are both Juju state servers.

To illustrate automatic spread, consider the mongodb charm. You’re going to use MongoDB as the datastore for your application, and you want to make sure the datastore is highly available; to do that, you’ll want to create a MongoDB replica set. It’s trivial to do this with the mongodb Juju charm:
juju deploy -n 3 mongodb

Wait a little while and you’ll have a 3-node MongoDB replica set. If a node happens to disappear, then the replica set will rejig itself so that there is a master (if the master was in fact lost) and everything should continue to work. If all the nodes go away, then you’re in trouble. This is where you want to go a step further and ensure your nodes are distributed across availability zones for greater resilience to failure. As of Juju 1.20, that “juju deploy” you just did handles that all for you: your 3 nodes will be uniformly spread across availability zones in the environment. If you add units to the service, they will also be spread across the zones according to how many other units of the service are in the zones. Let’s see what Juju did…
$ juju status mongodb | grep instance-id
instance-id: i-7a6d2b50
instance-id: i-ff1562d4
instance-id: i-627f0a30
$ ec2-describe-availability-zones
AVAILABILITYZONE us-east-1a available us-east-1
AVAILABILITYZONE us-east-1b available us-east-1
AVAILABILITYZONE us-east-1d available us-east-1
$ ec2-describe-instance-status i-7a6d2b50 i-ff1562d4 i-627f0a30 | grep i-
INSTANCE i-627f0a30 us-east-1d running 16 ok ok active
INSTANCE i-ff1562d4 us-east-1a running 16 ok ok active
INSTANCE i-7a6d2b50 us-east-1b running 16 ok ok active
(Note: the ec2-* commands are available in the ec2-api-tools package.)

Juju has distributed the mongodb units so that there is one in each zone, so if one zone is impaired the others will be unaffected. If we add a unit, it will go into one of the zones with the fewest mongodb units.

Explicit placement is currently only supported by Juju’s Amazon EC2 and OpenStack providers, but automatic spread is also supported by the Microsoft Azure provider. Due to the way that Microsoft Azure ties together availability zones and load balancing, it is currently necessary to forego “density” (i.e. explicit machine placement) in order to support automatic spread. If you are upgrading an existing environment to 1.20, then automatic spread will not be enabled. Newly created environments enable spread (and disable placement) by default, with an option to disable (availability-sets-enabled=false in environments.yaml).

Enjoy.

Monday, January 6, 2014

llgo on ssa

Hello there!

I've been busy hacking on llgo again. In case you're new here: llgo is a Go frontend for LLVM that I've been working on for the past ~2 years on and off. It's been quite a while since I last wrote; there has been a bunch of new work since, so I have some things to talk about at last.

A few months ago, I started working on rewriting swathes of llgo's internals to base it on go.tools/ssa. LLVM uses an SSA representation, which made the process fairly straightforward. Basing llgo on go.tools/ssa gives me much higher confidence in the quality of the output; it also presented a good opportunity to clean up llgo's source itself, which I have begun, but certainly not finished. llgo is now able to compile all packages in the standard library, except those that require cgo (net, os/user, runtime/cgo).

llgo now works something like this:
  1. Go source is scanned and parsed by go/ast and go/parser, producing an AST;
  2. The AST is fed into go/types, type-checking;
  3. The output of go/types is passed onto go.tools/ssa, which generates the SSA form;
  4. llgo translates the go.tools/ssa SSA form into an LLVM module,
  5. llgo-build links the LLVM modules for a program together and translates to an executable.

go.tools/ssa supports translating a whole program to SSA form, but llgo works in the traditional way: packages are translated one at a time. Whole program optimisation is enabled by linking the LLVM modules together, prior to any translation to machine code.

There were a few bits that I stumbled on, when rewriting. Alan Donovan, the author of go.tools/ssa, was kind enough to give me some assistance along the way. Anyway, the main issues I had were:

  • Translating Phi nodes requires a bit of finessing, to ensure processing of the Phi or the edges is not order-sensitive. This was dealt with by generating placeholder values for instructions that haven't yet been visited, and then replacing them later.
  • ssa.Index is emitted for indexing into arrays. If an array is in a register, then indexing it means extracting a value; in LLVM, an array element extraction requires a constant index. This is currently kludged by storing to a temporary alloca, and using the getelementptr LLVM instruction. Hopefully I'm missing something and this is easily fixed.
  • The Recover block is not dominated by the entry block, so it may not be valid for it to refer to the Alloc instructions for parameters and results. To deal with this, I generate a prologue block that contains the param/result Allocs; the prologue block conditionally jumps to either the recover or entry block, depending on panic/recover control flow. Alan has agreed to do something along these lines in go.tools/ssa.
  • The ssa.Next instruction required some assumptions to be made about block ordering and instruction placement, in order to be able to translate string-range using Phi nodes. Recent changes to go.tools/ssa exposed the dominator tree, making it possible to do away with the assumption now.


Various significant changes have been made during the course of the migration to go.tools/ssa:
  • Interfaces are now represented like in gc: empty interfaces with the runtime type & data, non-empty interfaces with an "itab" and data. Russ Cox wrote an article about the interface representation back in 2009.
  • Panic/recover (and defer, by consequence) are now using setjmp/longjmp. I had been using exceptions, but it was rightly pointed out to me that this wouldn't work unless there were a way of doing non-call exceptions in LLVM (which has not been implemented). The setjmp/longjmp approach incurs a cost for every function that may defer or recover, but it works without modifications to LLVM. Perhaps this will be revisited in the future.
  • go/types/typemap is now used for mapping types.Types to runtime type descriptors and LLVM types. Runtime type descriptors are now generated more completely, and more correctly. Identical type descriptors will now be merged at link-time.
  • llgo no longer generates conditional branching for calls to non-global functions, when comparing structs, or in map iteration. Apart from producing better code, this makes it much simpler to work with go.tools/ssa, which has its own idea about how the SSA basic blocks relate to one another.

There have also been miscellaneous bug fixes, and improvements, not directly related to the move to go.tools/ssa. Some highlights:
  • A custom importer/exporter, thanks to Fredrik Ehnbom. The importer side is disabled at the moment, due to an apparent bug in go.tools/ssa.
  • Debugger support, thanks again to Fredrik Ehnbom. I haven't reenabled it since the move to go.tools/ssa. I'll get onto that real soon now, because debugging without it can be tiresome.
  • llgo-build can now take a "-test" flag that causes llgo-build to compile the test Go files, yet again thanks to Fredrik Ehnbom. This is currently reliant on the binary importer being enabled, so it won't work out of the box until that bug is fixed.
  • Shifts now generate correct values for shifts greater than the width of the lhs operand.
  • Signed integer conversions now sign-extend correctly.
  • bytes.Compare now works as it should (-1, 0, 1, not <0, 0, >0). "llgo-build -test bytes": PASS
  • llgo-build can now take a "-run" flag that causes llgo-build to execute and then dispose of the resulting binary.
  • Type strings are propagated to LLVM types, making the IR more legible, thanks to Travis Cline.

I think that's everything. I have various things I'd like to tackle now, but not enough time to do it all at once. If you're interested in helping out then there's plenty to do, including:

  • Move to using libgo. Ideally the gc runtime would be rewritten in Go already, but that's not going to happen just yet. The compiler and linker are due to be rewritten in Go soon, which is a lot of work as it is.
  • Finish off runtime type descriptor generation (notably, type algorithms).
  • Get PNaCl support working again. This should be pretty close, but requires the binary importer to be enabled.
  • Implement cgo support.
  • Implement bounds checking, nil pointer checks, etc.
  • Get garbage collection working. There's Pull Request #108, but this is perpetuating the problem that is llgo's custom runtime. Since GC is fairly invasive, I don't want to go tying llgo to that runtime any more than it is currently. I expect this will have to wait until libgo is integrated.
  • Escape analysis. This is a must-have, but not immediately necessarily. The implementation should be based on go.tools/ssa, interfacing with the exporter/importer to record/consume information about external functions.
  • Make use of go.tools/ssa/ssautil/Switches. This is an optimisation, so again, not immediately necessary.

If you want to have a play around, then grab LLVM and Go, and then:

  • go get github.com/axw/llgo/cmd/llgo-dist && llgo-dist
  • llgo-build <some/package> or llgo-build file1.go, file2.go, ...
Let me know how you get on with that.


Here's hoping 2014 can be a productive year for llgo. Happy new year.


Cheers,
Andrew