It's been a while. I've implemented bits and pieces of the Go language: constants (though not properly handling arbitrary precision numbers at this stage), structs, functions (with and without receivers, as declarations and as literals), and goroutines. Much of the implementation is simplistic, not covering all bases. I want to get something vaguely useful working before I go down the path to completeness.
I intended to wait until I had something interesting to show off, but unfortunately I've hit a snag with LLVM.
Debug information might sound luxurious, but it's how I'm intending to encode information about exported symbols, so I can implement imports. The gc compiler creates a header in archives that lists all of this information. LLVM provides a standard set of metadata for describing debug information, using DWARF descriptors.
So I figured I could build on the LLVM metadata, extending it to describe Go-specific information where necessary. The benefit here, aside from reusing an existing well-defined format, is that I'd get DWARF debugging information in my output binaries, something I'd eventually want anyway. Unfortunately it appears that there's no C API for generating named metadata.
I'll have a look at extending the LLVM C API now. Down the rabbit hole...
Update - 7 January 2012
I submitted a patch to LLVM to add support for adding named metadata to a module, which has been accepted. This will, presumably, be part of LLVM 3.1.
Saturday, December 3, 2011
Monday, November 7, 2011
Writing a Compiler, or Learning a Language the Hard Way (Part Deux)
A couple of weeks ago I wrote that I would be attempting to write a Go compiler using the LLVM compiler infrastructure. The blog post got a bit of attention after being linked on Lamer News (and a bit more attention than intended!), so I guess there's at least a couple of people out there interested in hearing whether anything comes of this effort.
I've been a mixture of sick and busy (maybe a little bit slack?), but this weekend I finally found some time to put into my shiny new Go compiler. Yesterday I buffed the corners of what I had written so far, and pushed it to github. Behold, llgo! llgo does not currently compile much of the language, and it only compiles completely standalone programs (imports aren't handled). I'll be adding in features as I get time, but this is largely an educational endeavour for me, so I'm not really in a rush.
Let's look briefly at how it's put together. llgo is written in Go, and uses the standard Go parser/ast packages. For code generation, the gollvm (Go bindings for LLVM) package is used. So it's just a matter of calling the parser to generate an AST, then walking the AST and generating LLVM instructions. The latter part is what I'm most new to, and I'm just starting to get to grips with LLVM and SSA (Single Static Assignment), and their nuances. It's mostly pretty straightforward though.
Forward, ho!
There's plenty of things to implement, but there's a bunch of big ticket items that I'm particularly interested in solving. They are, in no particular order:
- Imports.
- Interfaces.
- Goroutines and channels.
- Deferred function calls.
- Closures.
- cgo
- Garbage Collection
Imports
If you were to attempt to compile a program with llgo - "Hello, World!", say - then I'm sure you'd find one giant gaping hole in the lack of support for imports. So you wouldn't be able to import fmt and do fmt.Println. Actually I have implemented the println builtin, but that's beside the point. The module system is pretty fundamental, so I'll have to address this soon.
The picture I have in my head is that each package will compile to (ideally) machine-independent LLVM bitcode libraries, which will go somewhere in the $GOROOT/pkg hierarchy. Just as Go examines archives to determine what they export, so llgo will load and examine the modules defined by the bitcode.
Somewhat related to imports is the runtime. I dare say that most of the imports people will ever do will be importing standard libraries, which will at some time or another use the Go runtime (e.g. reflection, and string comparisons, system calls). So I'll have to start thinking seriously about which parts of the runtime I'll be able be able to reuse, and which parts I'll rewrite.
Interfaces
In my previous blog post I talked about the idea of pseudo duck-typing in a statically compiled language ("implicit interfaces"), so this feature has a special place in my heart. I have some ideas of how to implement them, but I'll have to implement user-defined types first.
Somewhat related to imports is the runtime. I dare say that most of the imports people will ever do will be importing standard libraries, which will at some time or another use the Go runtime (e.g. reflection, and string comparisons, system calls). So I'll have to start thinking seriously about which parts of the runtime I'll be able be able to reuse, and which parts I'll rewrite.
Interfaces
In my previous blog post I talked about the idea of pseudo duck-typing in a statically compiled language ("implicit interfaces"), so this feature has a special place in my heart. I have some ideas of how to implement them, but I'll have to implement user-defined types first.
Goroutines and Channels
I'm not going for efficiency at this stage, I'm just going for functionality. So I intend, initially, to implement goroutines 1-1 with respect to threads. The gc compiler/runtime does M-N with a preemptive application-level scheduler; I think gccgo still does 1-1. I also do not intend, initially at least, to implement split stacks. These things can well be considered functionality, especially since the language intends to make creating many goroutines inexpensive and efficient. I have to prioritise, though, so I'll tackle efficiency and scalability later.
I've implemented channel-like data structures in C++ before, so I don't expect that to be too difficult. I'll just start out with a simple mutex/condition-based data structure, with or without an underlying FIFO array/queue depending on whether or not the channel is buffered.
I've implemented channel-like data structures in C++ before, so I don't expect that to be too difficult. I'll just start out with a simple mutex/condition-based data structure, with or without an underlying FIFO array/queue depending on whether or not the channel is buffered.
Deferred Function Calls
As a general rule, I'm trying to do this... not clean room, but not just by copying what's done in gc/gccgo either. After all, I'm trying to teach myself something here, and I find doing things is a great way of learning for me. Sometimes things go awry, but then I know what not do next time. It also serves as a good background when reading about how others have solved the problem.
Russ Cox wrote an interesting article on how deferred function calls are implemented in Go. Actually the article was nominally about how breaking abstractions can be a good thing, and I tend to agree. LLVM adds another level of abstraction, which means some of these functions don't end up being quite as efficient as when they're implemented directly in assembler or machine code.
Russ Cox wrote an interesting article on how deferred function calls are implemented in Go. Actually the article was nominally about how breaking abstractions can be a good thing, and I tend to agree. LLVM adds another level of abstraction, which means some of these functions don't end up being quite as efficient as when they're implemented directly in assembler or machine code.
Closures
If you're not familiar with this term, it's essentially a function that has captured some variables in its defining environment. In Go you can define function literals, which are anonymous functions. If a function literal refers to a variable defined in the outer scope, then the function will be defined as a closure.
I had been wondering about how I would go about implementing closures in llgo. I was thinking, broadly, that I would need to store the variables in memory alongside the function code. How would I do this in LLVM? I could treat closures differently, representing them as a structure containing the captured variables, and a pointer to the function, which would need to have additional arguments defined to accept the captured variables. Then the caller of the closure would have to treat it differently to a simple function. This seems a bit ugly, so I wondered, how does gc do it?
The gc implementation is very elegant: it allocates memory on the heap, with PROT_EXEC enabled, and stores dynamically generated machine code in it. At the end of the code, the captured variables are stored. The machine code loads the variables from memory onto the stack for use in the closure. Elegant, but how can we do that in LLVM?
LLVM abstracts away the details of the underlying target system, which means you deal with an intermediate representation of instructions rather than machine-specific instructions. We could dynamically generate code using LLVM, but that would mean requiring LLVM in every program, which seems a bit heavyweight. Or we could just reuse the code from gc, since it's pretty well contained, but that means adding in platform-specifics where there were none before. I think that's a better solution, but I might have to see what other LLVM-based compilers have done. I guess the Glasgow Haskell Compiler might be a good first place to look.
cgo
This one has the potential to be quite interesting. There's already a mature LLVM-based C compiler: clang. So llgo could potentially leverage it to implement a cgo equivalent. Both clang and llgo will emit bitcode; llgo (or the cgo equivalent) will analyse the bitcode emitted from clang, and determine the available functions and types.
Garbage Collection
I know conceptually how mark & sweep algorithms work, but I have never implemented one, nor even analysed an implementation. LLVM provides some support for garbage collection, which will presumably make things easier.
Over and Out
Rather than continuing to talk about doing stuff, I'm going to go and do stuff. If you're interested in following my progress, I suggest that you watch llgo on github.
Without further ado, adieu.
Sunday, October 23, 2011
Writing a Compiler, or Learning a Language the Hard Way
For a while now I've been lamenting the lack of progress in systems programming languages. I work primarily in C++ (professionally, anyway), a language which leaves a lot to be desired. The end result, assuming you know what you're doing, is likely to be fast code. But getting there can be a tedious and precarious journey. Don't get me wrong, I'm not a C++ hater. I just think we can do better.
So for the past few I've been storing away ideas for writing my own language. (Groan, another one.) The vast majority of languages these days seem to be JVM based, or some other VM such as the CLR. This is fine for many applications, but for a lot of systems programming you typically want to be much closer to the hardware. What I want is something with the ease and rapidity of development of, say, Python, with the power of a lower-level language such as C/C++.
The Easy: Ideas
So what are those ideas I'd stored away then? Well to be honest, many of them boil down to syntactic sugar (syntactic caramel?), so here are the more functional ones:
So for the past few I've been storing away ideas for writing my own language. (Groan, another one.) The vast majority of languages these days seem to be JVM based, or some other VM such as the CLR. This is fine for many applications, but for a lot of systems programming you typically want to be much closer to the hardware. What I want is something with the ease and rapidity of development of, say, Python, with the power of a lower-level language such as C/C++.
The Easy: Ideas
So what are those ideas I'd stored away then? Well to be honest, many of them boil down to syntactic sugar (syntactic caramel?), so here are the more functional ones:
- Implicit Interfaces. The idea is that classes should be able to implement an interface without explicitly saying so, just by implementing the methods defined in the interface. Poor-man's duck-typing if you will.
- Link Time Optimisation (LTO) should be the norm. Say you write a function in a library which does everything and the kitchen sink. If the calling program only uses a function from your library in a certain way, then the function should be optimised for that usage.
- Pure Functions. This kind of fits under LTO. I want simpler meta-programming: it should be possible to mark functions as being "pure", which means (in my terminology) that the function either has no side-effects, or affects only temporary variables. Calls to these functions could then be evaluated at compile time, e.g. to calculate complex mathematical expressions. I guess this just comes under "fancy optimiser"?
- Pattern Matching. I haven't really fleshed this one out, but I think it would be handy to be able to match functions not just based on signature, but based on the values of the arguments.
- No preprocessor, no headers. When code is distributed, the modules should be self describing, such as with Java's compiled .class files. This would eliminate problems such as compiling against an old version of an interface, and linking against a new, incompatible implementation.
- For the love of God, no garbage collection. What can I say? I like my RAII.
- I'm not really sure what title to give this one, but I wanted to eliminate the problems you get with linking objects together that have been compiled with different options in C++ (e.g. with/out RTTI, exceptions, or perhaps against an entirely different STL/standard library.)
The Hard: Writing a Compiler
Anyone can come up with ideas. Even coming up with practical ideas isn't too hard, but implementing them is something else altogether. Especially so for someone like me who has had no formal education in compilers, or language design/implementation. Hell, my university course didn't even cover formal grammars or parsing. I've picked some things up by myself, but it's a very academic field.
Back when I was in uni, I toyed around with C--, which at the time looked like quite a promising "portable assembly language". This language is (or at least was) used as the intermediate language for the Glasgow Haskell Compiler. My self-taught knowledge of language design/implementation and code generation were not really up to scratch, so I didn't get much further than writing toy languages.
Fast forward a few years, and I've got the itch to implement a compiler again. I've recently been playing around with LLVM. It boasts a relatively easy to use API, which takes out some of the drudgery of writing the "bitcode" (an intermediate code that can be translated to machine code). I've got a bit more knowledge and programming skill behind my belt now, so let's get to work on that language, right!?
I read something recently which is summarised as: learn intimately that which you wish to replace. Now I don't have any illusions of replacing any existing language, but I think the message is still relevant. I've never implemented a language OR a compiler before, and now I'm going to both at once? How about I make this a bit easier on myself, and write a compiler for an existing language. If I still want to implement my own language later, I can use the experience I gained.
I've been looking for an excuse for a while now to learn the Go Programming Language. It has some of the same design goals as my hypothetical language, so it seemed like a good fit.
The Fun: Learning Go and LLVM
I've been learning Go in my precious spare time for the last couple of weeks. There's a nifty web-app which provides an interactive tutorial in Go, called A Tour of Go. It's a bit of fun, and I recommend it to anyone wanting to learn the language.
So anyway, I'm now playing around with writing a Go compiler using LLVM. Why?
- To learn LLVM, and more generally how to write a compiler and generate code.
- To learn Go.
- Potentially to fill a gap in JIT compilation of Go programs.
- Why not?
Writing the compiler will be made much easier by the fact that the Go runtime includes a Go parser, and someone has already implemented Go bindings for LLVM. I haven't made a great deal of progress yet, but it seems achievable. When I've got something vaguely useful, I'll chuck it over on GitHub.
If you know anything about Go, you'll probably have noticed that at least one of the ideas that I presented above is present in Go: implicit interfaces. I really can't say whether this concept is new or not - it's probably not - but I did at least come up with it independently. Just sayin'!
I'll write a follow-up post when I get some time, describing a bit more about the Go compiler, the challenges I've come across, and my thoughts on how to solve them.
Friday, October 7, 2011
C++ Source-to-Source Translation
I've been on annual leave this week, so I've taken the opportunity to do some work on cmonster. I've added preliminary support for source-to-source translation by introducing a wrapper for Clang's "Rewriter" API. My fingers have been moving furiously so it's all a bit rough, but it does work.
The API flow is:
Also, the Visitor class defined in the above example could be called minimal at best. If there were any statements in the code that weren't handled by our visitor, the translation program would barf. I'll eventually build a complete Visitor class into cmonster to be reused. This should make writing translation tools a breeze; in our example, we would just override "visit_ReturnStatement" in the visitor.
Now, I think it's about time I learnt Go.
The API flow is:
- Parse translation unit, returning an Abstract Syntax Tree (AST).
- Walk AST to find bits of interest.
- Insert/replace/erase text in the original source, using the location stored in each declaration/statement/token.
Motivating Example
Logging is a necessary evil in complex software, especially when said software is running on a customer's system, inaccessible to you. To make problem determination easier, we want a decent amount of information: file names, line numbers, function names, time of day, thread ID, ... but all of this comes at a cost. I'm not talking just cost in terms of CPU usage, though that is a major concern. I'm talking cost in terms of source code quality and maintainability.
We'll start off with a trivial C program:
int main(int argc, const char *argv[]) { if (argc % 2 == 0) { return 1; } else { return 0; } }
Let's say our needs are fairly humble: we just want to log the entry and exit of this function. Logging entry is easy: add a blob of code at the top of the function. We can get the function name and line number using __func__ (C99, C++11) and __LINE__. What about __func__ in C89? C++98? There's various alternatives, but some compilers have nothing. And that makes writing a cross-platform logging library a big old PITA. The information is there in the source code - if only we could get at it! In the end, we're more likely to forego DRY, and just reproduce the function name as a string literal.
Getting the function name and line number isn't a huge problem, but how about adding function exit logging? Now we're going to have to insert a little bit of code before our return statements. So we'll have something like:
int main(int argc, const char *argv[]) { const char *function = "main"; printf("Entering %s:%s:%d\n", function, __FILE__, __LINE__); if (argc % 2 == 0) { printf("Leaving %s:%s:%d\n", function, __FILE__, __LINE__); return 1; } else { printf("Leaving %s:%s:%d\n", function, __FILE__, __LINE__); return 0; } return 0; }
Ugh. And that's just the start. It gets much nastier when we need to turn logging on/off at runtime, filter by function name, etc. We could make it much nicer with a variadic macro. Something like LOG(format...), which calls a varargs function with the 'function' variable, __FILE__, __LINE__ and the format and arguments you specify. Unfortunately variadic macros are not supported by some older compilers. The first version of Visual Studio to support them was Microsoft Visual Studio 2005. So there goes that idea...
Hmmm, what to do, what to do? Wouldn't it be nice if we could just tag a function as requiring entry/exit logging, and have our compiler toolchain to the work? Entry/exit logging is the sort of thing you want to be consistent, so it should suffice to define one set of rules that covers all functions. Let's take a little peek at what we could do with cmonster.
First, we'll parse the source to get an AST. We'll locate all functions defined in the main file, and insert an "Entry" logging statement at the beginning of the body, and an "Exit" logging statement before each return statement in the body. At the end we dump the rewritten source to stdout, and we have a program, with logging, ready to be compiled.
First, we'll parse the source to get an AST. We'll locate all functions defined in the main file, and insert an "Entry" logging statement at the beginning of the body, and an "Exit" logging statement before each return statement in the body. At the end we dump the rewritten source to stdout, and we have a program, with logging, ready to be compiled.
Tada! Running this, we're given:
#include <stdio.h> int main(int argc, const char *argv[]) { printf("Entering main at line 2\n"); if (argc % 2 == 0) { printf("Returning from main at line 6\n"); return 1; } else { printf("Returning from main at line 10\n"); return 0; } }
Future Work
What we can't do yet is insert, replace, erase or modify declarations or statements directly in the AST, and have that reflected as a text insertion/replacement/erasure. For example, maybe I want to rename a function? Why can't I just go "function_declaration.name = 'new_name'". At the moment we'd need to replace the text identified by a source range... a bit clunky and manual. So I may add a more direct API in at a later stage. It should be doable, but may be a lot of work.
Also, the Visitor class defined in the above example could be called minimal at best. If there were any statements in the code that weren't handled by our visitor, the translation program would barf. I'll eventually build a complete Visitor class into cmonster to be reused. This should make writing translation tools a breeze; in our example, we would just override "visit_ReturnStatement" in the visitor.
Now, I think it's about time I learnt Go.
Sunday, September 25, 2011
cmonster update
I've been quietly beavering away at cmonster, and thought I should share an update.
Note: the changes I describe below are not yet part of a cmonster release. I'll release something once I've stabilised the API and tested it more thoroughly. In the mean time you can pull the source from github.
In the last month or so I have been adding C/C++ parsing capabilities to cmonster, by exposing the underlying Clang parser API. I've been wanting a simple, scriptable framework for analysing and manipulating C++ source for a few years now. The reason for wanting such a thing is so that I, and others, can more rapidly develop tools for writing better C++, and eliminate some of the drudgery. I've only just made a start, but cmonster now provides an API for parsing a C/C++ source file, returning a Python object to inspect the result.
So what's cmonster looking like now? We now have a preprocessor and a parser interface, the former now being incorporated into the latter. The parser interface will parse a single source file, and return its Abstract Syntax Tree (AST). As is typical with parsers, there are many classes involved to describe each declaration, statement, type, etc. So I've added Cython into the mix to speed up the process of defining Python equivalents for each of these classes.
Unfortunately Cython does not yet support PEP 384 (Py_LIMITED_API), so at the moment cmonster is back to requiring the full Python API, and thus must be rebuilt for each new release. I've had a tinker with Cython to get its output to compile with Py_LIMITED_API, and hope to provide a patch in the near future.
What's next? Once I get the AST classes mapped out, I intend to introduce a source-to-source translation layer. I'm not entirely sure how this'll work yet, but I think ideally you'd just modify the AST and call some function to rewrite the main source file. Elements of the AST outside of the main source file would be immutable. That's the hope, but it may end up being something a little more crude, using Clang's "Rewriter" interface directly to replace source ranges with some arbitrary string. I expect this will be a ways off yet, though.
Monday, September 5, 2011
Hello, Mr. Hooker.
I've been procrastinating on cmonster. I have some nasty architectural decisions to make, and I keep putting it off. In the mean time I've been working on a new little tool called "Mr. Hooker" (or just mrhooker).
Introducing Mr. Hooker
The idea behind mrhooker is very simple: I wanted to be able to write LD_PRELOAD hooks in Python. If you're not familiar with LD_PRELOAD, it's a mechanism employed by various UNIX and UNIX-like operating systems for "preloading" some specified code in a shared library. You can use this to provide your own version of native functions, including those in standard libraries such as libc.
Anyway, I occasionally find the need for an LD_PRELOAD library to change the behaviour of a program that I can't easily recompile. Often these libraries will be throw-away, so it might end up taking just as long to write the LD_PRELOAD library. So I wrote mrhooker to simplify this.
It turns out there's very little to do, since Cython (and friends) do most of the hard work. Cython is a programming language that extends Python to simplify building Python extensions. It also has an interface for building these extensions on-the-fly. So mrhooker doesn't need to do much - it takes a .pyx (Pyrex/Cython source) and compiles it to a shared library using Cython. Mrhooker takes this, and some common code, and loads it into a child process using LD_PRELOAD.
Example - Hooking BSD Sockets
Let's look at an example of how to use mrhooker. Hooks are defined as external functions in a Cython script. Say we want to hook the BSD sockets "send" function. First we'd find the signature of send (man 2 send), which is:
Given this, we can produce a wrapper in Cython, like so:
There's a couple of important things to note here. First, the parameter type for "buf" drops const, since Cython doesn't know about const-ness. Second, and crucially, the function must be defined "with gil". This ensures that the function acquires the Python Global Interpreter Lock before calling any Python functions. Okay, with that out of the way, let's go on...
We'll want to do something vaguely useful with this wrapper. Let's make it print out the argument values, and then continue on with calling the original "send" function. To do that we'll use dlsym/RTLD_NEXT to find the next function called "send".
We'll also need to declare dlsym and RTLD_NEXT. Let's do that.
Now you just run:
And there we go. This is trivial - it would also be fairly trivial to write a C program to do this. But if we wanted to do anything more complex, or if we were frequently changing the wrapper function, I'd much rather write it in Python - or Cython, as it were.
Enjoy!
Edit: I just noticed that it's broken if you don't have a certain config file. I always had one while testing... until I got to work.
You'll get an error "ConfigParser.NoSectionError: No section: 'default'". I'll fix the code at home, but in the mean time you can do this:
P.S. if you add "build_dir = <path>" in that section, or a per-module section, mrhooker/Cython will store the shared library that it builds. Then if you don't change the source it'll be used without rebuilding.
Introducing Mr. Hooker
The idea behind mrhooker is very simple: I wanted to be able to write LD_PRELOAD hooks in Python. If you're not familiar with LD_PRELOAD, it's a mechanism employed by various UNIX and UNIX-like operating systems for "preloading" some specified code in a shared library. You can use this to provide your own version of native functions, including those in standard libraries such as libc.
Anyway, I occasionally find the need for an LD_PRELOAD library to change the behaviour of a program that I can't easily recompile. Often these libraries will be throw-away, so it might end up taking just as long to write the LD_PRELOAD library. So I wrote mrhooker to simplify this.
It turns out there's very little to do, since Cython (and friends) do most of the hard work. Cython is a programming language that extends Python to simplify building Python extensions. It also has an interface for building these extensions on-the-fly. So mrhooker doesn't need to do much - it takes a .pyx (Pyrex/Cython source) and compiles it to a shared library using Cython. Mrhooker takes this, and some common code, and loads it into a child process using LD_PRELOAD.
Example - Hooking BSD Sockets
Let's look at an example of how to use mrhooker. Hooks are defined as external functions in a Cython script. Say we want to hook the BSD sockets "send" function. First we'd find the signature of send (man 2 send), which is:
ssize_t send(int sockfd, const void *buf, size_t len, int flags);
Given this, we can produce a wrapper in Cython, like so:
cdef extern ssize_t send(int sockfd, char *buf, size_t len, int flags) with gil: ...
There's a couple of important things to note here. First, the parameter type for "buf" drops const, since Cython doesn't know about const-ness. Second, and crucially, the function must be defined "with gil". This ensures that the function acquires the Python Global Interpreter Lock before calling any Python functions. Okay, with that out of the way, let's go on...
We'll want to do something vaguely useful with this wrapper. Let's make it print out the argument values, and then continue on with calling the original "send" function. To do that we'll use dlsym/RTLD_NEXT to find the next function called "send".
cdef extern ssize_t send(int sockfd, char *buf, size_t len, int flags) with gil: print "====> send(%r, %r, %r, %r)" % (sockfd, buf[:len], len, flags) real_send = dlsym(RTLD_NEXT, "send") if real_send: with nogil: res = (<ssize_t(*)(int, void*, size_t, int) nogil>real_send)( sockfd, buf, len, flags) return res else: return -1
We'll also need to declare dlsym and RTLD_NEXT. Let's do that.
# Import stuff from <dlfcn.h> cdef extern from "dlfcn.h": void* dlsym(void*, char*) void* RTLD_NEXT
Now you just run:
mrhooker <script.pyx> <command>
And there we go. This is trivial - it would also be fairly trivial to write a C program to do this. But if we wanted to do anything more complex, or if we were frequently changing the wrapper function, I'd much rather write it in Python - or Cython, as it were.
Enjoy!
Edit: I just noticed that it's broken if you don't have a certain config file. I always had one while testing... until I got to work.
You'll get an error "ConfigParser.NoSectionError: No section: 'default'". I'll fix the code at home, but in the mean time you can do this:
$ mkdir ~/.mrhooker $ echo [default] > ~/.mrhooker/mrhooker.config
P.S. if you add "build_dir = <path>" in that section, or a per-module section, mrhooker/Cython will store the shared library that it builds. Then if you don't change the source it'll be used without rebuilding.
Thursday, September 1, 2011
Google App Engine Agent
A couple of months ago I wrote about my foray into the world of Google App Engine. More recently, I'd gotten the itch again, and had some ideas of how to fix the problems I found when attempting to get Pushy to work in Google App Engine.
The root of most of the problems is that Google App Engine is stateless in nature. Server instances can be spun up or spun down without notice, and so we can't store complex state, which Pushy really requires. So a couple of weeks ago I set to investigating implementing a server-initiated RPC mechanism that is asynchronous and (mostly) stateless.
How would it work? Well, earlier this year I read that ProtoRPC was released, which brought RPC services to Google App Engine. In our case, Google App Engine is the client, and is calling the agent - but we can at least reuse the API to minimise dependencies and hopefully simplify the mechanism. Okay, so we have a ProtoRPC service running on a remote machine, consumed by our Google App Engine application. How do they talk?
One thing I wanted to avoid was the need for polling, as that's both slow and expensive. Slow in that there will necessarily be delays between polls, and expensive in that unnecessary polls will burn CPU cycles in Google App Engine, which aren't free. Long-polling isn't possible, either, since HTTP requests are limited to 30 seconds of processing time. If you read my last post, you probably already know what I'm going to say: we'll use XMPP.
What's XMPP? That's the Extensible Messaging and Presence Protocol, which is the protocol underlying Jabber. It is also the primary protocol that Google Talk is built on. It's an XML-based, client-server protocol, so peers do not talk directly to each other. It's also asynchronous. So let's look at the picture so far...
The root of most of the problems is that Google App Engine is stateless in nature. Server instances can be spun up or spun down without notice, and so we can't store complex state, which Pushy really requires. So a couple of weeks ago I set to investigating implementing a server-initiated RPC mechanism that is asynchronous and (mostly) stateless.
How would it work? Well, earlier this year I read that ProtoRPC was released, which brought RPC services to Google App Engine. In our case, Google App Engine is the client, and is calling the agent - but we can at least reuse the API to minimise dependencies and hopefully simplify the mechanism. Okay, so we have a ProtoRPC service running on a remote machine, consumed by our Google App Engine application. How do they talk?
One thing I wanted to avoid was the need for polling, as that's both slow and expensive. Slow in that there will necessarily be delays between polls, and expensive in that unnecessary polls will burn CPU cycles in Google App Engine, which aren't free. Long-polling isn't possible, either, since HTTP requests are limited to 30 seconds of processing time. If you read my last post, you probably already know what I'm going to say: we'll use XMPP.
What's XMPP? That's the Extensible Messaging and Presence Protocol, which is the protocol underlying Jabber. It is also the primary protocol that Google Talk is built on. It's an XML-based, client-server protocol, so peers do not talk directly to each other. It's also asynchronous. So let's look at the picture so far...
- The client (agent) and server (GAE application) talk to each other via XMPP.
- The agent serves a ProtoRPC service, and the GAE application will consume it.
Because our RPC mechanism will be server-initiated, we'll need something else: agent availability discovery. Google App Engine provides XMPP handlers for agent availability (and unavailability) notification. When an agent starts up it will register its presence with the application. When agent is discovered, the application will request the agent's service descriptor. The agent will respond, and the application will store it away in Memcache.
We (ab)use Memcache for sharing of data between instances of the application. When you make enough requests to the application, Google App Engine may dynamically spin up a new instance to handle requests. By storing the service descriptor in Memcache, it can be accessed by any instance. I said abuse because Memcache is not guaranteed to keep the data you put in it - it may be expelled when memory is constrained. Really we should use Datastore, but I was too lazy to deal with cleaning it up. "Left as an exercise for the reader." One thing I did make a point of using was to use the new Python Memcache CAS API, which allows for safe concurrent updates to Memcache.
Orrite. So now we have an agent and application which talk to each other via XMPP, using ProtoRPC. The application discovers the agent, and, upon request, the agent describes its service to the application. How can we use it? Well the answer is really "however you like", but I have created a toy web UI for invoking the remote service methods.
Wot 'ave we 'ere then? The drop-down selection has all of the available agent JIDs (XMPP IDs). The textbox has some Python code, which will be executed by the Google App Engine application. Yes, security alert! This is just a demonstration of how we can use the RPC mechanism - not a best practice. When you hit "Go!", the code will be run by the application. But before doing so, the application will set a local variable "agent", which is an instance of the ProtoRPC service stub bound to the agent selected in the drop-down.
ProtoRPC is intended to be synchronous (from the looks of the comments in the code, anyway), but there is an asynchronous API for clients. But given that application requests can only take up to 30 seconds to service a request, our application can't actively wait for a response. What to do? Instead, we need to complete the request asynchronously when the client responds, and convey some context to the response handler so it knows what to do with it.
In the demo, I've done something fairly straight forward with regards to response handling. When the UI is rendered, we create an asynchronous channel using the Channel API. We use this to send the response back to the user. So when the code is executed, the service stub is invoked, and the channel ID is passed as context to the client. When the client responds, it includes the context. Once again, security alert. We could fix security concerns by encrypting the context to ensure the client doesn't tamper with it. Let's just assume the client is friendly though, okay? Just this once!
So we finally have an application flow that goes something like this:
- Agent registers service.
- Server detects agent's availability, and requests client's service descriptor.
- Client sends service descriptor, server receives and stores it in Memcache.
and then...
- User hits web UI, which server renders with a new channel.
- User selects an agent and clicks "Go!".
- Server instantiates a service stub, and invokes it with the channel ID as context. The invocation sends an XMPP message to the agent.
- Agent receives XMPP message, decodes and executes the request. The response is sent back to the server as an XMPP message, including the context set by the server.
- The server receives the response, and extracts the response and channel ID (context). The response is formatted and sent to the channel.
- The web UI's channel Javascript callback is invoked and the response is rendered.
Fin
I've put my code up on GitHub, here: http://github.com/axw/gaea. Feel free to fork and/or have a play. I hope this can be of use to someone. If nothing else, I've learnt a few new tricks!
Labels:
google app engine,
python,
rpc,
xmpp
Thursday, August 4, 2011
It's Alive!
I've been quietly hacking away on cmonster (née csnake). I like this name even more: I think it describes my creation better. If you thought preprocessors were horrible before, well...
What is cmonster?
cmonster is a C preprocessor with a few novel features on top of the standard fare:
- Allows users to define function macros in Python, inline.
- Allows users to define a callback for locating #include files, when the file can not be found in the specified include directories.
- Provides a Python API for iterating over tokens in the output stream.
cmonster is built on top of Clang, a modern C language family compiler, which contains a reusable, programmable preprocessor. At present, cmonster requires Clang 3.0 APIs, which has not yet been released. I have been working off Clang's subversion trunk.
I have just uploaded a binary distribution of the first alpha version (0.1) of cmonster to pypi. I have only built/tested it on Linux 32-bit, Python 3.2, and I don't expect it will work on anything else yet. If you want to play around with it, you can install cmonster using "easy_install cmonster" or by grabbing it off pypi and installing it manually.
Demonstration
Seeing is believing - how does this thing work? We'll ignore everying except for inline Python macros for now, because that's the most stable part of the API.
Demonstration
Seeing is believing - how does this thing work? We'll ignore everying except for inline Python macros for now, because that's the most stable part of the API.
It is possible to define macros inline in cmonster, using the builtin "py_def" macro. For example:
py_def(factorial(n)) import math return str(math.factorial(int(str(n)))) py_end
When cmonster sees this, it will grab everything up to "py_end", and define a Python function. It will also create a preprocessor macro with the function's name (as given in the py_def heading), and this macro will be directed to call the Python function. The Python function will be passed the argument tokens that the macro was invoked with. It can return either a sequence of tokens, or a string that will subsequently be tokenised.
Addressing Shortcomings
In my previous post about csnake I mentioned a few things that needed to be done. I have addressed some of these things in cmonster:
A way of detecting (or at least configuring) pre-defined macros and include paths for a target compiler/preprocessor. A standalone C preprocessor isn't worth much. It needs to act like or delegate to a real preprocessor, such as GCC.I have added support for emulating GCC. This is done by consulting GCC for its predefined macros (using "gcc -dM"), and using the new "include locator" callback feature. By setting an include locator callback on a cmonster preprocessor, you provide cmonster with a second-chance attempt at locating an include file when the file can not be found in the specified include directories. This method can be used to determine GCC's system include directories: whenever we can't find an include file, we consult GCC and add parse the output to determine the location of the file on disk. I intend to add support for more (common) compilers/preprocessors in the future. Namely, MSVC.
I had another crazy idea for (ab)using include locators: handling specially formatted #include directives to feed off-disk files into the preprocessor. Buh? I mean, say, the ability to pull down headers from a hosted repository such as github (e.g. #include <github.yourproject/header.h>), and feeding them into the preprocessor as in-memory files. Or generating headers on the fly (e.g. #include <myapi.proto>, to automatically generate and include Protobuf headers).
A #pragma to define Python macros in source, or perhaps if I'm feeling adventurous, something like #pydefine.Support for inline Python macros has been implemented: see the "Demonstration" section above. It's unlikely I'll attempt to create a #pydefine, as it would be more work than it's worth.
A simple command line interface with the look and feel of a standard C preprocessorThe distribution now contains a "cmonster" script, which invokes the preprocessor on a file specified on the command-line. This will need a lot of work: presently you can't add user include directories or (un)define macros. Neither of these things are difficult to add, they just haven't been top priorities.
Future Work
Once I've tested, documented and stabilised the API, I'll look at (in no definite order):
- Improvement the command line interface. Add the standard "-I", "-D" and "-U" parameters.
- Portability. Probably Windows first, since it's common and I have access to it.
- Emulation of additional preprocessors.
- Passing in-memory files ("file like" Python objects) to #include.
Sunday, July 17, 2011
Controlling Remote Agents from Google App Engine
A month or so ago I was brainstorming ideas related to Google App Engine (GAE), as I had been wanting a reason to play with it for a while. One idea that stuck was connecting a remote Python process to GAE via Pushy, so we could either control GAE or GAE could control the remote process. I'm still working on the C/Python Preprocessor thingy, but I took a break from that this weekend to look into the possibility of a GAE Pushy transport.
So yesterday morning I signed up for an account, and started tinkering. I had been reading the docs already, and I figured there were a few possible approaches:
So yesterday morning I signed up for an account, and started tinkering. I had been reading the docs already, and I figured there were a few possible approaches:
- The obvious: use HTTP. This has one major drawback in that it is inherently synchronous and wholly driven by the client. Moreover, GAE only allows request handlers around 30s to complete, so no kind of fancy long-polling is possible here.
- Channel API. This sounds like the right kind of thing to use, but it's aimed at interacting with Javascript in a webpage.
- XMPP. Huh? Isn't that for instant messaging? Exactly. The client (remote Python process) and server (GAE) are peers in XMPP, and either one can initiate sending messages to the other. Let's look into this a bit more...
So I did a quick search for Python XMPP libraries, and a few came up. I settled on xmpppy, but to be honest I didn't find any of them particularly compelling. The APIs are a bit clunky. Anyway, the approach I took was to have an XMPP handler in my GAE application create a persistent Pushy connection object associated with a pair of read/write files that wrapped the XMPP API. When an XMPP message came in, the application would extract the Pushy request from base-64 encoded body of the message, and return the result in a similar manner.
And it worked, but only just. I had to make a few hacks to Pushy to get all of this work. There were some oddities I had to work around, such as the "eval" built-in in GAE's Python not taking any keyword arguments. Unfortunately, I don't think this particular transport is very useful in the flaky state it's in at the moment. Also, it's not terribly valuable to build a transport to control a GAE application, since other APIs exist for that purpose (ProtoRPC, remote_api). More useful would be to have the GAE application control the remote process without the need for polling. I'll be looking into this further.
Tuesday, June 21, 2011
Le sigh
I've been coming across more problems with Boost Wave. The current ones blocking me are:
- A lack of an efficient way to conditionally disable a macro. The "context policy" provides hooks for handling macro expansions, and its return value is meant to control whether the expansion takes place. It doesn't work. I'll write up a bug report when I get some time.
- Wave isn't very forgiving about integer under/overflow. For example, the GNU C library's header "/usr/include/bits/wchar.h" has the following tidbit to determine the sign of wide characters, which Boost Wave barfs on:
#elif L'\0' - 1 > 0
I think the latter "problem" might actually be reasonable - I believe the standards say that handling of overflow is undefined, and preprocessor/compiler-specific. That doesn't help me much though. I could fix this by writing code to parse the expressions, which seems silly, or by passing off to the target preprocessor (e.g. GCC), which seems like overkill.
I'm going to have a look at how hard it would be to use LLVM/Clang's preprocessor instead. If that's a better bet, I may go that way. Otherwise, it might be time to get approval to send patches to the Wave project.
I'm going to have a look at how hard it would be to use LLVM/Clang's preprocessor instead. If that's a better bet, I may go that way. Otherwise, it might be time to get approval to send patches to the Wave project.
Saturday, June 18, 2011
Inline Python macros in C
I had a bit of time today to do some more work on that which is soon to be called something other than csnake. I've added a couple of features:
This works as follows: py_def and py_end are macros, which in turn use the _Pragma operator with built-in pragmas. Those pragmas are handled by csnake, and signal to collect the tokens in between. When the py_end macro is reached, the tokens are concatenated and a Python function macro is born.
I'm intending to do additonal "Python blocks", including at least a py_for block, which will replicate the tokens within the block for each iteration of a loop.
There's one big problem with the py_def support at the moment, which is that the tokens go through the normal macro replacement procedure. I think I'll have a fix for that soon.
- You can now define custom pragmas, providing a Python handler function. Unfortunately Boost Wave, which csnake uses for preprocessing, only provides callbacks for pragmas that start with "#pragma wave".
- Built-in pragmas and macros to support defining Python macros inline in C/C++ code.
- A __main__ program in the csnake package. So you can now do "python3.2 -m csnake ", which will print out the preprocessed source.
So for example, you can do something like as follows, entirely in one C++ file:
// factorial macro. py_def(factorial(n)) import math f = math.factorial(int(str(n))) return [Token(T_INTLIT, f)] py_end int main() { std::cout << factorial(3) << std::endl; return 0; }
This works as follows: py_def and py_end are macros, which in turn use the _Pragma operator with built-in pragmas. Those pragmas are handled by csnake, and signal to collect the tokens in between. When the py_end macro is reached, the tokens are concatenated and a Python function macro is born.
I'm intending to do additonal "Python blocks", including at least a py_for block, which will replicate the tokens within the block for each iteration of a loop.
There's one big problem with the py_def support at the moment, which is that the tokens go through the normal macro replacement procedure. I think I'll have a fix for that soon.
Wednesday, June 15, 2011
Name clash
Rats, looks like I didn't do enough homework on the name "csnake". Turns out there's another Python-based project called CSnake: https://github.com/csnake-org/CSnake/. Incidentally - and not that it matters much - I had the name picked out before that project was released. Now I need to think up another puntastic name.
Saturday, June 11, 2011
C-Preprocessor Macros in Python
TL;DR: I've started a new project, csnake, which allows you to write your C preprocessor macros in Python.
Long version ahead...
You want to do what now?
I had this silly idea a couple of years ago, to create a C preprocessor in which macros can be defined in Python. This was borne out of me getting sick of hacking build scripts to generate code from data, but pursued more for fun.
I started playing around with Boost Wave, which is a "Standards conformant, and highly configurable implementation of the mandated C99/C++ preprocessor functionality packed behind an easy to use iterator interface". With a little skulduggery coding, I managed to define macros as C++ callable objects taking and returning tokens. Then it was a simple matter of adding a Python API.
The Result
What we end up with is a Python API that looks something like this:
Which will take...
And give you...
If it's not immediately clear, it will translate "factorial()" into an integer literal of the factorial of the input token. This isn't a very interesting example, so if you can imagine a useful application, let me know ;)
The above script will work with the current code, using Python 3.2, compiling with GCC. If you'd like to play with it, grab the code from the csnake github repository. Once you've got it, run "python3.2 setup.py build". Currently there is just an extension module ("csnake._preprocessor"), so set your PYTHONPATH to the build directory and play with that directly.
I have chosen to make csnake Python 3.2+ only, for a couple of major reasons:
The Road Ahead
What I've done so far is very far from complete, and not immediately useful. It may never be very useful. But if it is to be, it would require at least:
Long version ahead...
You want to do what now?
I had this silly idea a couple of years ago, to create a C preprocessor in which macros can be defined in Python. This was borne out of me getting sick of hacking build scripts to generate code from data, but pursued more for fun.
I started playing around with Boost Wave, which is a "Standards conformant, and highly configurable implementation of the mandated C99/C++ preprocessor functionality packed behind an easy to use iterator interface". With a little skulduggery coding, I managed to define macros as C++ callable objects taking and returning tokens. Then it was a simple matter of adding a Python API.
The Result
What we end up with is a Python API that looks something like this:
import sys from _preprocessor import * def factorial(n): import math return [Token(T_INTLIT, math.factorial(int(str(n))))] p = Preprocessor("test.cpp") p.define(factorial) for t in p.preprocess(): sys.stdout.write(str(t)) sys.stdout.write("\n")
Which will take...
int main(){return factorial(3);}
And give you...
int main(){return 6;}
If it's not immediately clear, it will translate "factorial(
The above script will work with the current code, using Python 3.2, compiling with GCC. If you'd like to play with it, grab the code from the csnake github repository. Once you've got it, run "python3.2 setup.py build". Currently there is just an extension module ("csnake._preprocessor"), so set your PYTHONPATH to the build directory and play with that directly.
I have chosen to make csnake Python 3.2+ only, for a couple of major reasons:
- All the cool kids are doing it: it's the way of the future. But seriously, Python 3.x needs more projects to become more mainstream.
- Python 3.2 implements PEP 384, which allows extension modules to be used across Python versions. Finally. I always hated that I had to recompile for each version.
... and one very selfish (but minor) reason: I wanted to modernise my Python knowledge. I've been ignoring Python 3.x for far too long.
The Road Ahead
What I've done so far is very far from complete, and not immediately useful. It may never be very useful. But if it is to be, it would require at least:
- A way of detecting (or at least configuring) pre-defined macros and include paths for a target compiler/preprocessor. A standalone C preprocessor isn't worth much. It needs to act like or delegate to a real preprocessor, such as GCC.
- A #pragma to define Python macros in source, or perhaps if I'm feeling adventurous, something like #pydefine.
- A simple, documented Python API.
- A simple command line interface with the look and feel of a standard C preprocessor.
- Some unit tests.
I hope to add these in the near future. I've had code working for the first two points, and the remaining points are relatively simple. I will post again when I have made some significant progress.
Monday, May 23, 2011
Pushy 0.5 Released
After just a few short months since the 0.4 release, I am pleased to announce Pushy 0.5. As usual, this release is mostly bug fixes, with a couple of features to make life easier. Instructions on downloading and installing can be found here:
http://awilkins.id.au/pushy/install.php
On top of standard bug fixes and features, I have made an effort to beautify some of the code, and speed everything up in general, based on code profiling. This will be evident if your Pushy program performs a lot of proxying: 0.5 performs up to ~2x faster than 0.4 in my tests.
New Features
There are two new features in Pushy 0.5: with-statement support, and a simple "execute" interface.
With-statement support
Pushy connections now support the "with" statement. This is functionally equivalent to wrapping a Pushy connection with "contextlib.closing": it will close the connection when the with-statement exits. For example:
Compile/execute interface
Previously if you wanted to execute a statement in the remote interpreter, you would have to first obtain the remote "compile" built-in function, invoke it to compile a remote code object, and then evaluate that with the "connection.eval" method. For example, to execute "print 'Hello, World!'":
This is a bit unfriendly, so I thought it would be a good idea to add a simpler interface. There are two new methods: "connection.compile" and "connection.execute". The former will compile a remote code object, and the latter executes either a string (statement) or a function. Continuing our very simple example, we get the much simpler:
Under the hood, this will do exactly what we would previously have had to do manually: remotely compile the statement and then evaluate the resultant code object. Now that suffices for a very simple use case like we have discussed above, but what if we want to execute a series of statements? Wouldn't it be nice if we could remotely execute a locally defined function? Well, now you can, using "connection.compile".
The "connection.compile" method will take a locally defined function and define it in the remote interpreter. It will then return a reference to the remote function, so that we can invoke it as if it were a local function. This allows the user to define complex and/or expensive logic that will be executed entirely remotely, only communicating back to the peer when proxy objects are involved, or to return the result.
Bug Fixes
The most noteworthy bugs fixed in 0.5 are:
#738216 Proxied old-style classes can't be instantiated
Old-style/classic classes (i.e. those that don't inherit from "object") previously could not be instantiated via a Pushy connection. This has been rectified by defining an additional proxy type for old-style classes.
#733026 auto-importer doesn't support modules not imported by their parent
Previously, importing a remote submodule via "connection.modules" would only work if the parent of the submodule imported it. For example, "connection.modules.os.path" would work since os imports os.path. If the parent does not import the submodule, Pushy would fail to import the module. This has been fixed in 0.5; remotely imported modules will provide the same sort of automagical importing interface as "connection.modules".
#734311 Proxies are strongly referenced and never deleted; __del__ isn't transmitted to delete proxied object
This is the most significant change in Pushy 0.5. Since inception, Pushy has never released proxy objects, meaning eventual memory bloating for objects that would otherwise by garbage collected. As of 0.5, Pushy will now garbage collect proxies by default. Every so often (by default, 5 seconds), Pushy will send a message to the peer to let it know which proxies have been garbage collected. This allows the peer to release the proxied objects, and reclaim resources.
#773811 Daemon/socket transport is slow
Due to the many small messages sent back and forth by Pushy, the "daemon" transport was found to be quite slow in stress testing. This is due to Nagle's Algorithm, which delays sending network packets so that they can be combined. This has a considerable impact on the latency of Pushy's operations, and so it is now disabled by Pushy.
Enjoy!
http://awilkins.id.au/pushy/install.php
On top of standard bug fixes and features, I have made an effort to beautify some of the code, and speed everything up in general, based on code profiling. This will be evident if your Pushy program performs a lot of proxying: 0.5 performs up to ~2x faster than 0.4 in my tests.
New Features
There are two new features in Pushy 0.5: with-statement support, and a simple "execute" interface.
With-statement support
Pushy connections now support the "with" statement. This is functionally equivalent to wrapping a Pushy connection with "contextlib.closing": it will close the connection when the with-statement exits. For example:
with pushy.connect("local:") as con: ...
Compile/execute interface
Previously if you wanted to execute a statement in the remote interpreter, you would have to first obtain the remote "compile" built-in function, invoke it to compile a remote code object, and then evaluate that with the "connection.eval" method. For example, to execute "print 'Hello, World!'":
code_obj = con.eval("compile")("print 'Hello, World!'", "<filename>", "exec") con.eval(code_obj, locals=..., globals=...)
This is a bit unfriendly, so I thought it would be a good idea to add a simpler interface. There are two new methods: "connection.compile" and "connection.execute". The former will compile a remote code object, and the latter executes either a string (statement) or a function. Continuing our very simple example, we get the much simpler:
con.execute("print 'Hello, World!'")
Under the hood, this will do exactly what we would previously have had to do manually: remotely compile the statement and then evaluate the resultant code object. Now that suffices for a very simple use case like we have discussed above, but what if we want to execute a series of statements? Wouldn't it be nice if we could remotely execute a locally defined function? Well, now you can, using "connection.compile".
def local_function(*args, **kwargs): return (args, kwargs) remote_function = con.compile(local_function) remote_function(...)
The "connection.compile" method will take a locally defined function and define it in the remote interpreter. It will then return a reference to the remote function, so that we can invoke it as if it were a local function. This allows the user to define complex and/or expensive logic that will be executed entirely remotely, only communicating back to the peer when proxy objects are involved, or to return the result.
Bug Fixes
The most noteworthy bugs fixed in 0.5 are:
#738216 Proxied old-style classes can't be instantiated
Old-style/classic classes (i.e. those that don't inherit from "object") previously could not be instantiated via a Pushy connection. This has been rectified by defining an additional proxy type for old-style classes.
#733026 auto-importer doesn't support modules not imported by their parent
Previously, importing a remote submodule via "connection.modules" would only work if the parent of the submodule imported it. For example, "connection.modules.os.path" would work since os imports os.path. If the parent does not import the submodule, Pushy would fail to import the module. This has been fixed in 0.5; remotely imported modules will provide the same sort of automagical importing interface as "connection.modules".
#734311 Proxies are strongly referenced and never deleted; __del__ isn't transmitted to delete proxied object
This is the most significant change in Pushy 0.5. Since inception, Pushy has never released proxy objects, meaning eventual memory bloating for objects that would otherwise by garbage collected. As of 0.5, Pushy will now garbage collect proxies by default. Every so often (by default, 5 seconds), Pushy will send a message to the peer to let it know which proxies have been garbage collected. This allows the peer to release the proxied objects, and reclaim resources.
#773811 Daemon/socket transport is slow
Due to the many small messages sent back and forth by Pushy, the "daemon" transport was found to be quite slow in stress testing. This is due to Nagle's Algorithm, which delays sending network packets so that they can be combined. This has a considerable impact on the latency of Pushy's operations, and so it is now disabled by Pushy.
Enjoy!
Wednesday, March 9, 2011
Fabric and Pushy, together at last
A workmate of mine recently brought Fabric to my attention. If you're not familiar with Fabric, it is a command-line utility for doing sys-admin type things over SSH. There's an obvious overlap with Pushy there, so I immediately thought, "Could Pushy be used to back Fabric? What are the benefits, what are the downsides?"
A couple of fairly significant benefits:
- Pushy does more than just SSH, so Fabric could conceivably be made to support additional transports by using Pushy (which uses Paramiko), rather than using Paramiko directly.
- Pushy provides access to the entire Python standard library, which is largely platform independent. So you could do things like determine the operating system name without using "uname", which is a *NIX thing. That's a trivialisation, but you get the idea I'm sure.
- Pushy absolutely requires Python on the remote system (as well as SSH, of course, but Fabric requires that anyway.) So requiring Pushy would mean that Fabric would be restricted to working only with remote machines that have Python installed. Probably a safe bet in general, but not ideal.
How about using Pushy if Python is available, and just failing gracefully if it doesn't? This turns out to be really easy to do, since Fabric and Pushy both use Paramiko. So I wrote a Fabric "operation" to import a remote Python module. Under the covers, this piggy-backs a Pushy connection over the existing Paramiko connection created by Fabric. I'll bring this to the attention of the Fabric developers, but I thought I'd just paste it here for now.
First an example of how one might use the "remote_import" operation. Pass it a module name, and you'll get back a reference to the remote module. You can then use the module as you would use the module as if you had done a plain old "import ".
fabfile.py
from fabric_pushy import remote_import def get_platform(): platform = remote_import("platform") print platform.platform()
You just execute your fabfile as per usual, and the "remote_import" operation will create a Pushy connection to each host, import the remote Python interpreter's standard platform module, and call its platform method to determine its platform name. Easy like Sunday morning...
$ fab -H localhost get_platform [localhost] Executing task 'get_platform' Linux-2.6.35-27-generic-i686-with-Ubuntu-10.10-maverick Done. Disconnecting from localhost... done.
fabric_pushy.py
from fabric.state import env, default_channel from fabric.network import needs_host import pushy import pushy.transport from pushy.transport.ssh import WrappedChannelFile class FabricPopen(pushy.transport.BaseTransport): """ Pushy transport for Fabric, piggy-backing the Paramiko SSH connection managed by Fabric. """ def __init__(self, command, address): pushy.transport.BaseTransport.__init__(self, address) # Join arguments into a string args = command for i in range(len(args)): if " " in args[i]: args[i] = "'%s'" % args[i] command = " ".join(args) self.__channel = default_channel() self.__channel.exec_command(command) self.stdin = WrappedChannelFile(self.__channel.makefile("wb"), 1) self.stdout = WrappedChannelFile(self.__channel.makefile("rb"), 0) self.stderr = self.__channel.makefile_stderr("rb") def __del__(self): self.close() def close(self): if hasattr(self, "stdin"): self.stdin.close() self.stdout.close() self.stderr.close() self.__channel.close() # Add a "fabric" transport", which piggy-backs the existing SSH connection, but # otherwise operates the same way as the built-in Paramiko transport. class pseudo_module: Popen = FabricPopen pushy.transports["fabric"] = pseudo_module ############################################################################### # Pushy connection cache connections = {} @needs_host def remote_import(name, python="python"): """ A Fabric operation for importing and returning a reference to a remote Python package. """ if (env.host_string, python) in connections: conn = connections[(env.host_string, python)] else: conn = pushy.connect("fabric:", python=python) connections[(env.host_string, python)] = conn m = getattr(conn.modules, name) if "." in name: for p in name.split(".")[1:]: m = getattr(m, p) return m
Saturday, March 5, 2011
Pushy 0.4 Released
I'm pleased to announce a new release of Pushy, version 0.4. This release has been concerned primarily with improving the existing code quality, fixing bugs that have been found in the use of 0.3, and adding missing functionality to the Java API.
You can install Pushy with easy_install ("easy_install pushy"), or grab the source distribution from here: https://launchpad.net/pushy/+download. I have also been working on some more documentation and getting-started type content on the brand spanking new website: http://awilkins.id.au/pushy.
Do you use Pushy? What for? Do you have any ideas for improvements? Leave a comment here, or drop me an email at axwalk@gmail.com. I'd really appreciate some feedback!
Thursday, February 10, 2011
Java Pushy API
It's been some time since I've spruiked Pushy, so here we go. One of my colleagues was talking the other day about how he had implemented a netcat-like service for STAF, an automation framework aimed at testing. This got me thinking about how this could be done relatively easily, using the pushy.net package, something I haven't written about much, primarily because it's still a work in progress.
Back in version 0.3, I introduced a Java API to Pushy, as I mentioned in an earlier post. I briefly mentioned the incorporation of packages which mimic, and in many cases are interface compatible with, packages in the Java standard library such as java.io and java.net.
The pushy.net package currently contains three main classes:
- RemoteInetSocketAddress (extends java.net.InetSocketAddress)
- RemoteSocket (extends java.net.Socket), and
- RemoteServerSocket (extends java.net.ServerSocket).
RemoteInetSocketAddress simply provides a means of creating an InetSocketAddress whose address is resolved at the remote host. RemoteSocket is a wrapper around a remote Python socket object, but extends java.net.Socket to provide a familiar interface to Java developers. Similarly, RemoteServerSocket extends java.net.ServerSocket, and wraps a remote Python socket object.
So how about netcat emulation? Well, I won't cover the whole implementation of a netcat clone, as that would be a non-trivial undertaking. But I will show you one of the fundamental requirements: to bind a server socket, accept a client connection, and print out the data received from that client.
Step 1. Bind a socket, and listen for connections.
In this code snippet, we're creating a RemoteServerSocket, and assigning it to a java.net.ServerSocket, to illustrate interface compatibility. The first argument to the constructor is the pushy.Client object we previously created, and the second argument is the port to bind to. Specifying a port of zero, means that we want to bind to an ephemeral port.
The creation of the RemoteServerSocket involves creating a Python socket object on the remote host, and performing the bind and listen methods on it.
Step 2. Accept a client connection.
Here we can see that accepting a client connection is exactly as we would do with a standard java.net.ServerSocket. This probably isn't surprising, since we've upcasted our RemoteServerSocket to a plain old ServerSocket. One thing of interest here is that the Server object returned is in fact a RemoteServerSocket, wrapping a remote Python socket object.
Step 3. Read data from the client connection.
Et voila! We can read the remote socket's output via a java.io.InputStream object, returned by the getInputStream method, which is overridden by RemoteSocket. One thing you may have noticed: to run this all on the local host, sans Pushy, you could substitute the right-hand side of the initial ServerSocket construction with a standard ServerSocket, and the rest of the code would remain unchanged.
There are a few defects in the 0.3 release related to the pushy.net package, which will prevent these examples from working. I have rectified them in the process of writing this post. If you grab the trunk, it should all work nicely. There is one defect remaining: the InputStream returned by RemoteSocket is based on a a file returned by Python's socket.makefile method. This differs from the InputStream returned by Java's standard Socket, in that a request for N bytes will not return until all N bytes, or EOF, are received. I hope to have this fixed in the coming days.
import java.net.ServerSocket; import java.net.Socket; import pushy.Client; public class Test { public static void main(String[] args) throws Exception { Client conn = new Client("local:"); try { ServerSocket server = new pushy.net.RemoteServerSocket(conn, 0); try { } finally { server.close(); } } finally { conn.close(); } } }
In this code snippet, we're creating a RemoteServerSocket, and assigning it to a java.net.ServerSocket, to illustrate interface compatibility. The first argument to the constructor is the pushy.Client object we previously created, and the second argument is the port to bind to. Specifying a port of zero, means that we want to bind to an ephemeral port.
The creation of the RemoteServerSocket involves creating a Python socket object on the remote host, and performing the bind and listen methods on it.
Step 2. Accept a client connection.
Socket client = server.accept(); try { } finally { client.close(); }
Here we can see that accepting a client connection is exactly as we would do with a standard java.net.ServerSocket. This probably isn't surprising, since we've upcasted our RemoteServerSocket to a plain old ServerSocket. One thing of interest here is that the Server object returned is in fact a RemoteServerSocket, wrapping a remote Python socket object.
Step 3. Read data from the client connection.
InputStream in = client.getInputStream(); byte[] buf = new byte[1024]; int nread = in.read(buf, 0, buf.length); while (nread != -1) { System.out.write(buf, 0, nread); System.out.flush(); nread = in.read(buf, 0, buf.length); }
Et voila! We can read the remote socket's output via a java.io.InputStream object, returned by the getInputStream method, which is overridden by RemoteSocket. One thing you may have noticed: to run this all on the local host, sans Pushy, you could substitute the right-hand side of the initial ServerSocket construction with a standard ServerSocket, and the rest of the code would remain unchanged.
There are a few defects in the 0.3 release related to the pushy.net package, which will prevent these examples from working. I have rectified them in the process of writing this post. If you grab the trunk, it should all work nicely. There is one defect remaining: the InputStream returned by RemoteSocket is based on a a file returned by Python's socket.makefile method. This differs from the InputStream returned by Java's standard Socket, in that a request for N bytes will not return until all N bytes, or EOF, are received. I hope to have this fixed in the coming days.
Sunday, February 6, 2011
Facebook Puzzles
So a couple of weeks ago I came across a forum post on XKCD about programming puzzles.Figuring that's a significantly better way to while away the hours I'd previously been spending playing computer games, I thought I'd take a crack.
So after browsing around, I came across Facebook Puzzles, which seems to be part of Facebook's recruitment tools. Solve some puzzles, apply for a job, and say how rad you are for having solved all of their puzzles. Something like that. Anyway, I had no intention of applying to Facebook, I just wanted to revive my fading computer science knowledge, and have a bit of fun in the progress. I don't see what the big fuss is about working there; so many other companies are producing much more interesting, constructive things.
Naturally I went straight to the most difficult puzzle on the list, Facebull. It didn't take long for me to realise this was an NP-hard problem. It's fairly straight forward to see that this is a graph problem. A little more formally, we're given a directed graph (digraph), and we're asked to find a minimum cost strongly-connected sub-digraph, containing all vertices, but a subset of the edges.
Spoilers ahead. Turn back now if you don't want any help.
Originally I had thought that it was the Asymmetric Traveling Salesman Problem (ATSP) in disguise, which is just the Traveling Salesman Problem (TSP) with directed edges. I spent a couple of hours here and there reading the literature, since I didn't know anything about exact solutions (which is what the puzzle requires), and was only vaguely aware of approximation algorithms (which might help, for example, to reduce the search space.) For anyone wanting to read up on the TSP problem, I would definitely recommend the survey paper The Traveling Salesman Problem, by Jünger, Reinelt and Rinaldi. Also, this website provides a great, gentle introduction.
Not too long after, it dawned on me that the problem was not ATSP at all, since vertices ("compounds") might be visited more than once. For example, consider a graph with the following edges:
(C1, C2), (C2, C1)
(C1, C3), (C3, C1)
This could not be solved by the ATSP, since we would have to visit C1 twice to complete a cycle from C1 to C2 to C3 and back to C1. However, it is required to be solvable for the Facebull puzzle. Back to the drawing board...
The general approach I took was similar to the one that would be taken for ATSP, though, and that is to use a "Branch and Bound" algorithm. This is a fancy way of saying, enumerate all possible solutions, backtracking at the earliest possible opportunity by computing lower and upper bounds. For example, we might start with an upper bound of infinity, and update it whenever we find a candidate solution with a lower cost than the upper bound. A lower bound can be computed by taking into a constraint: each vertex must have an in-degree of at least one, and an out-degree of at least one. By remembering the cost of each edge in the graph, we can easily compute a lower bound.
Is this enough? Perhaps, but we can do better. One more thing we can do is to prune edges. For example, if we have the edges (Ci, Cj), (Ci, Ck), and (Ck, Cj), and the sum of the cost of the latter two is less than the cost of the first, then we can eliminate the first.
In total, I submitted my solution nine times. Why so many, you might ask? Because I didn't parse the input correctly, and didn't realise this while I was banging my head against the problem for two weeks. I wasn't handling white space at the end of the file, and I (foolishly) assumed that the test cases would not assign multiple edges to the same pair of vertices. Why would you have such a trivial test case to trip people up, when the underlying puzzle is hard enough as it is? It turns out my accepted solution (written in C++) ran in 300ms, so I guess the focus was more or less centred on correctness, rather than speed. That was a little frustrating, but at least I had some fun and learnt a few tricks along the way.
There is still something wrong with my solution, however. It is not finding the optimal solution to some of the test cases people have published. But it passed the puzzle robot, so that's the main thing that matters. I had one more optimisation, which was to identify edges which are absolutely required, up-front. This can be done very cheaply, and may significantly reduce the search space in sparse graphs. This optimisation was interacting badly with my enumeration function, though; I'm not sure what the problem is, perhaps I'll go back and take another look some time.
Now, back to wasting my time playing computer games...
So after browsing around, I came across Facebook Puzzles, which seems to be part of Facebook's recruitment tools. Solve some puzzles, apply for a job, and say how rad you are for having solved all of their puzzles. Something like that. Anyway, I had no intention of applying to Facebook, I just wanted to revive my fading computer science knowledge, and have a bit of fun in the progress. I don't see what the big fuss is about working there; so many other companies are producing much more interesting, constructive things.
Naturally I went straight to the most difficult puzzle on the list, Facebull. It didn't take long for me to realise this was an NP-hard problem. It's fairly straight forward to see that this is a graph problem. A little more formally, we're given a directed graph (digraph), and we're asked to find a minimum cost strongly-connected sub-digraph, containing all vertices, but a subset of the edges.
Spoilers ahead. Turn back now if you don't want any help.
Originally I had thought that it was the Asymmetric Traveling Salesman Problem (ATSP) in disguise, which is just the Traveling Salesman Problem (TSP) with directed edges. I spent a couple of hours here and there reading the literature, since I didn't know anything about exact solutions (which is what the puzzle requires), and was only vaguely aware of approximation algorithms (which might help, for example, to reduce the search space.) For anyone wanting to read up on the TSP problem, I would definitely recommend the survey paper The Traveling Salesman Problem, by Jünger, Reinelt and Rinaldi. Also, this website provides a great, gentle introduction.
Not too long after, it dawned on me that the problem was not ATSP at all, since vertices ("compounds") might be visited more than once. For example, consider a graph with the following edges:
(C1, C2), (C2, C1)
(C1, C3), (C3, C1)
This could not be solved by the ATSP, since we would have to visit C1 twice to complete a cycle from C1 to C2 to C3 and back to C1. However, it is required to be solvable for the Facebull puzzle. Back to the drawing board...
The general approach I took was similar to the one that would be taken for ATSP, though, and that is to use a "Branch and Bound" algorithm. This is a fancy way of saying, enumerate all possible solutions, backtracking at the earliest possible opportunity by computing lower and upper bounds. For example, we might start with an upper bound of infinity, and update it whenever we find a candidate solution with a lower cost than the upper bound. A lower bound can be computed by taking into a constraint: each vertex must have an in-degree of at least one, and an out-degree of at least one. By remembering the cost of each edge in the graph, we can easily compute a lower bound.
Is this enough? Perhaps, but we can do better. One more thing we can do is to prune edges. For example, if we have the edges (Ci, Cj), (Ci, Ck), and (Ck, Cj), and the sum of the cost of the latter two is less than the cost of the first, then we can eliminate the first.
In total, I submitted my solution nine times. Why so many, you might ask? Because I didn't parse the input correctly, and didn't realise this while I was banging my head against the problem for two weeks. I wasn't handling white space at the end of the file, and I (foolishly) assumed that the test cases would not assign multiple edges to the same pair of vertices. Why would you have such a trivial test case to trip people up, when the underlying puzzle is hard enough as it is? It turns out my accepted solution (written in C++) ran in 300ms, so I guess the focus was more or less centred on correctness, rather than speed. That was a little frustrating, but at least I had some fun and learnt a few tricks along the way.
There is still something wrong with my solution, however. It is not finding the optimal solution to some of the test cases people have published. But it passed the puzzle robot, so that's the main thing that matters. I had one more optimisation, which was to identify edges which are absolutely required, up-front. This can be done very cheaply, and may significantly reduce the search space in sparse graphs. This optimisation was interacting badly with my enumeration function, though; I'm not sure what the problem is, perhaps I'll go back and take another look some time.
Now, back to wasting my time playing computer games...
Subscribe to:
Posts (Atom)