Thursday, February 10, 2011

Java Pushy API

It's been some time since I've spruiked Pushy, so here we go. One of my colleagues was talking the other day about how he had implemented a netcat-like service for STAF, an automation framework aimed at testing. This got me thinking about how this could be done relatively easily, using the pushy.net package, something I haven't written about much, primarily because it's still a work in progress.

Back in version 0.3, I introduced a Java API to Pushy, as I mentioned in an earlier post. I briefly mentioned the incorporation of packages which mimic, and in many cases are interface compatible with, packages in the Java standard library such as java.io and java.net.

The pushy.net package currently contains three main classes:
  • RemoteInetSocketAddress (extends java.net.InetSocketAddress)
  • RemoteSocket (extends java.net.Socket), and
  • RemoteServerSocket (extends java.net.ServerSocket).
RemoteInetSocketAddress simply provides a means of creating an InetSocketAddress whose address is resolved at the remote host. RemoteSocket is a wrapper around a remote Python socket object, but extends java.net.Socket to provide a familiar interface to Java developers. Similarly, RemoteServerSocket extends java.net.ServerSocket, and wraps a remote Python socket object.

So how about netcat emulation? Well, I won't cover the whole implementation of a netcat clone, as that would be a non-trivial undertaking. But I will show you one of the fundamental requirements: to bind a server socket, accept a client connection, and print out the data received from that client.

Step 1. Bind a socket, and listen for connections.

import java.net.ServerSocket;
import java.net.Socket;
import pushy.Client;

public class Test {
    public static void main(String[] args) throws Exception {
        Client conn = new Client("local:");
        try {
            ServerSocket server = new pushy.net.RemoteServerSocket(conn, 0);
            try {
            } finally {
                server.close();
            }
        } finally {
            conn.close();
        }
    }
}

In this code snippet, we're creating a RemoteServerSocket, and assigning it to a java.net.ServerSocket, to illustrate interface compatibility. The first argument to the constructor is the pushy.Client object we previously created, and the second argument is the port to bind to. Specifying a port of zero, means that we want to bind to an ephemeral port.

The creation of the RemoteServerSocket involves creating a Python socket object on the remote host, and performing the bind and listen methods on it.

Step 2. Accept a client connection.

Socket client = server.accept();
try {
} finally {
    client.close();
}

Here we can see that accepting a client connection is exactly as we would do with a standard java.net.ServerSocket. This probably isn't surprising, since we've upcasted our RemoteServerSocket to a plain old ServerSocket. One thing of interest here is that the Server object returned is in fact a RemoteServerSocket, wrapping a remote Python socket object.

Step 3. Read data from the client connection.

InputStream in = client.getInputStream();
byte[] buf = new byte[1024];
int nread = in.read(buf, 0, buf.length);
while (nread != -1) {
    System.out.write(buf, 0, nread);
    System.out.flush();
    nread = in.read(buf, 0, buf.length);
}

Et voila! We can read the remote socket's output via a java.io.InputStream object, returned by the getInputStream method, which is overridden by RemoteSocket. One thing you may have noticed: to run this all on the local host, sans Pushy, you could substitute the right-hand side of the initial ServerSocket construction with a standard ServerSocket, and the rest of the code would remain unchanged.

There are a few defects in the 0.3 release related to the pushy.net package, which will prevent these examples from working. I have rectified them in the process of writing this post. If you grab the trunk, it should all work nicely. There is one defect remaining: the InputStream returned by RemoteSocket is based on a a file returned by Python's socket.makefile method. This differs from the InputStream returned by Java's standard Socket, in that a request for N bytes will not return until all N bytes, or EOF, are received. I hope to have this fixed in the coming days.

Sunday, February 6, 2011

Facebook Puzzles

So a couple of weeks ago I came across a forum post on XKCD about programming puzzles.Figuring that's a significantly better way to while away the hours I'd previously been spending playing computer games, I thought I'd take a crack.

So after browsing around, I came across Facebook Puzzles, which seems to be part of Facebook's recruitment tools. Solve some puzzles, apply for a job, and say how rad you are for having solved all of their puzzles. Something like that. Anyway, I had no intention of applying to Facebook, I just wanted to revive my fading computer science knowledge, and have a bit of fun in the progress. I don't see what the big fuss is about working there; so many other companies are producing much more interesting, constructive things.

Naturally I went straight to the most difficult puzzle on the list, Facebull. It didn't take long for me to realise this was an NP-hard problem. It's fairly straight forward to see that this is a graph problem. A little more formally, we're given a directed graph (digraph), and we're asked to find a minimum cost strongly-connected sub-digraph, containing all vertices, but a subset of the edges.

Spoilers ahead. Turn back now if you don't want any help.

Originally I had thought that it was the Asymmetric Traveling Salesman Problem (ATSP) in disguise, which is just the Traveling Salesman Problem (TSP) with directed edges. I spent a couple of hours here and there reading the literature, since I didn't know anything about exact solutions (which is what the puzzle requires), and was only vaguely aware of approximation algorithms (which might help, for example, to reduce the search space.) For anyone wanting to read up on the TSP problem, I would definitely recommend the survey paper The Traveling Salesman Problem, by Jünger, Reinelt and Rinaldi. Also, this website provides a great, gentle introduction.

Not too long after, it dawned on me that the problem was not ATSP at all, since vertices ("compounds") might be visited more than once. For example, consider a graph with the following edges:
    (C1, C2), (C2, C1)
    (C1, C3), (C3, C1)
This could not be solved by the ATSP, since we would have to visit C1 twice to complete a cycle from C1 to C2 to C3 and back to C1. However, it is required to be solvable for the Facebull puzzle. Back to the drawing board...

The general approach I took was similar to the one that would be taken for ATSP, though, and that is to use a "Branch and Bound" algorithm. This is a fancy way of saying, enumerate all possible solutions, backtracking at the earliest possible opportunity by computing lower and upper bounds. For example, we might start with an upper bound of infinity, and update it whenever we find a candidate solution with a lower cost than the upper bound. A lower bound can be computed by taking into a constraint: each vertex must have an in-degree of at least one, and an out-degree of at least one. By remembering the cost of each edge in the graph, we can easily compute a lower bound.

Is this enough? Perhaps, but we can do better. One more thing we can do is to prune edges. For example, if we have the edges (Ci, Cj), (Ci, Ck), and (Ck, Cj), and the sum of the cost of the latter two is less than the cost of the first, then we can eliminate the first.

In total, I submitted my solution nine times. Why so many, you might ask? Because I didn't parse the input correctly, and didn't realise this while I was banging my head against the problem for two weeks. I wasn't handling white space at the end of the file, and I (foolishly) assumed that the test cases would not assign multiple edges to the same pair of vertices. Why would you have such a trivial test case to trip people up, when the underlying puzzle is hard enough as it is? It turns out my accepted solution (written in C++) ran in 300ms, so I guess the focus was more or less centred on correctness, rather than speed. That was a little frustrating, but at least I had some fun and learnt a few tricks along the way.

There is still something wrong with my solution, however. It is not finding the optimal solution to some of the test cases people have published. But it passed the puzzle robot, so that's the main thing that matters. I had one more optimisation, which was to identify edges which are absolutely required, up-front. This can be done very cheaply, and may significantly reduce the search space in sparse graphs. This optimisation was interacting badly with my enumeration function, though; I'm not sure what the problem is, perhaps I'll go back and take another look some time.

Now, back to wasting my time playing computer games...