The Sound of Software: 2010

Saturday, August 7, 2010

Pushy Public Documentation

So, I finally got around to writing some proper documentation for Pushy.

I had originally used Epydoc to extract docstrings, and generate API documents, which I have been hosting. Then I realised I could publish HTML to PyPI, so I thought I'd do something a little more friendly than presenting the gory details of the API.

In the past I've used Asciidoc, a lightweight markup language, in the vein of Wiki markup languages. I found Asciidoc fairly simply to write, and there is a standard tool for processing and producing various output, including of course HTML. I wanted to make my documentation to have the look and feel of the Python standard library, so I've been looking into reStructuredText.

I have to say that reStructuredText is very easy to learn, and Sphinx, which is the processing tool used to generate the HTML output for the Python documentation, is a pleasure to use. The format of reStructuredText is similar to that of Asciidoc. So far I don't have any particular affinity to either - I mainly went with reStructuredText/Sphinx for the Python documentation theming.

Saturday, July 10, 2010

Java-to-Python

Did you ever want to call Python code from Java?

Until Java 5, I wasn't much of a fan of the Java language. The addition of enums, generics, and some syntactic sugar (e.g. foreach) makes it more bearable to develop in. But all of this doesn't change the fact that Java purposely hides non-portable operating system functions, such as signal handling and Windows registry access. Sure, you can do these things with JNI, but that makes application deployment significantly more complex.

Python, on the other hand, embraces the differences between operating systems. Certainly where it makes sense, platform-independent interfaces are provided - just like Java, and every other modern language. But just because one platform doesn't deal in signals shouldn't mean that we can't use signals on Linux/UNIX.

I realise it's probably not the most common thing to do, but calling Python code from Java needn't be difficult. There's Jython, which makes available a large portion of the Python standard library, but still has some way to go. Some modules are missing (e.g. _winreg, ctypes), and others are incomplete or buggy. That's not to say that the project isn't useful, it's just not 100% there yet.

Enter Pushy (version 0.3). In version 0.3, a Java API was added to Pushy, making it possible to connect to a Python interpreter from a Java program, and access objects and invoke functions therein. So, for example, you now have the ctypes module available to your Java program, meaning you can load shared objects / DLLs and call arbitrary functions from them. See below for a code sample.

import pushy.Client;
import pushy.Module;
import pushy.PushyObject;

public class TestCtypes {
    public static void main(String[] args) throws Exception {
        Client client = new Client("local:");
        try {
            // Import the "ctypes" Python module, and get a reference to the
            // GetCurrentProcessId function in kernel32.
            Module ctypes = client.getModule("ctypes");
            PushyObject windll = (PushyObject)ctypes.__getattr__("windll");
            PushyObject kernel32 = (PushyObject)windll.__getattr__("kernel32");
            PushyObject GetCurrentProcessId =
                (PushyObject)kernel32.__getattr__("GetCurrentProcessId");

            // Call GetCurrentProcessId. Note that the Python interpreter is
            // running in a separate process, so this is NOT the process ID
            // running the Java program.
            Number pid = (Number)GetCurrentProcessId.__call__();
            System.out.println(pid);
        } finally {
            client.close();
        }
    }
}

Neat, eh? What I have demonstrated here is the following:

Connecting a Java program to a freshly created Python interpreter on the local host.
Importing the ctypes module therein, and then getting a reference to kernel32.dll (this assumes Windows, obviously. It would be much the same on Linux/UNIX platforms.)
Executing the GetCurrentProcessId function to obtain the process ID of the Python interpreter, and returning the result as a java.lang.Number object.

The final point is an important one. Pushy automatically casts types from Python types to their Java equivalent. For example, a Python dictionary object will be returned as a java.util.Map. If that map is modified, then the changes will be made in the remote dictionary object also. Tuples, which are immutable, are returned as arrays, whilst Python lists are returned as java.util.List objects.

The Pushy Java API aims to provide Java standard library equivalent interfaces to Python standard library modules where possible. For example, there is a pushy.io package for dealing with files in the Python interpreter using java.io.-like classes. Similarly, a pushy.net package exists for performing socket operations using a java.net-like interface.

One can also connect to SSH hosts, as is possible through the Pushy Python API. This is achieved by creating a connection to a local Python interpreter, and thence to the remote system using the Pushy Python API in the subprocess. This is all done transparently if a target other than "local:" is specified in the Java program.

Enjoy!

Wednesday, June 30, 2010

First Post

Woohoo!

On a more serious note... if anyone stumbles across this page, I will be using it for discussing my project Pushy. Pushy is a Python (now Java too) package for connecting to a remote Python interpreter, and accessing objects therein as if they were local. In other words, it's a sort of RPC package.

So why another RPC package? While I was working on test automation, I identified a couple of things I didn't like about existing RPC frameworks:

Invariably, a nailed-up (i.e. runs for an extended period of time) server is required to be running for you to connect to. This leads to the problem #2.
Custom software needs to be maintained on both the client and the server.
The security mechanisms in existing frameworks have a tendency to suck. For nailed-up servers running as an arbitrary user, the server program must perform its own authentication/authorisation to ensure the user can't access resources it isn't supposed to.

My thinking was this: rather than implementing RPC services and maintaining them on all of these different servers, why can't I put all of the logic in the client? The Python Standard Library is rather extensive, why don't we just expose that to the client, and let the client define the "service"? That's the basis of Pushy.

So first I started hacking away to develop a proof of concept, using XML-RPC to transparently access objects in a remote interpreter, automatically creating proxy objects to represent remote objects and performing method/operator calls by sending requests. Then I found RPyC, which did essentially the same thing, only better.

So at this stage I've moved the "service" from server to client, but I still need to run some code on the server, and it's still "nailed-up". What's more, is that the server is running as a single user, which poses a massive security risk. How can we do better? Enter SSH... SSH is prevalent on Linux and UNIX operating systems, and one of the cool things you do with it is remotely execute a command and pipe to/from its standard I/O. Maybe we could do something with that?

Using SSH solves all three problems, in fact. Pushy works as follows: the client application imports the Pushy package, and invokes a function to "connect" to a remote host. What this does is creates an SSH connection to the remote host, using the username and password (or public-key encryption) specified by the caller. After the connection is created, Pushy executes Python in the remote system, passing it a command-line program. i.e. Something like "python -c 'run_server()'". This command-line program is a short one, which reads a larger program off its standard input stream, and executes it to start the Pushy server program. The program didn't exist on disk before the connection, and won't exist after.

So let's revisit the three problems now:

(Problem: a nailed-up server is required.) Unless you count sshd, no nailed-up services are required on the server. I think it's fair to discount sshd, as it is so commonplace.
(Problem: custom software must be maintained on both client and server.) As described in the paragraph above, there is no longer any custom server code required to be maintained on the server. So we can implement programs to access arbitrary portions of the Python Standard Library on a remote system, with nary a change on the server. There is an added benefit here: if the client/server protocol changes, there's still nothing to upgrade on the server.
(Problem: server code must perform application-level authentication/authorisation.) Did I mention I'm lazy? Doing authentication/authorisation properly is a pain in the neck, and I'd like to avoid it if I can. Turns out I can if I'm using SSH, as the "server" program is running as the user specified by the client program. Usual operating system authentication and access controls ensues.

Over time I decided to drop RPyC in favour of a writing my own protocol. I was using RPyC for things it wasn't intended and it showed in various areas, such as exception handling. One thing remains though: the auto-importing feature of RPyC is lovingly imitated by Pushy.