Sunday, February 6, 2011

Facebook Puzzles

So a couple of weeks ago I came across a forum post on XKCD about programming puzzles.Figuring that's a significantly better way to while away the hours I'd previously been spending playing computer games, I thought I'd take a crack.

So after browsing around, I came across Facebook Puzzles, which seems to be part of Facebook's recruitment tools. Solve some puzzles, apply for a job, and say how rad you are for having solved all of their puzzles. Something like that. Anyway, I had no intention of applying to Facebook, I just wanted to revive my fading computer science knowledge, and have a bit of fun in the progress. I don't see what the big fuss is about working there; so many other companies are producing much more interesting, constructive things.

Naturally I went straight to the most difficult puzzle on the list, Facebull. It didn't take long for me to realise this was an NP-hard problem. It's fairly straight forward to see that this is a graph problem. A little more formally, we're given a directed graph (digraph), and we're asked to find a minimum cost strongly-connected sub-digraph, containing all vertices, but a subset of the edges.

Spoilers ahead. Turn back now if you don't want any help.

Originally I had thought that it was the Asymmetric Traveling Salesman Problem (ATSP) in disguise, which is just the Traveling Salesman Problem (TSP) with directed edges. I spent a couple of hours here and there reading the literature, since I didn't know anything about exact solutions (which is what the puzzle requires), and was only vaguely aware of approximation algorithms (which might help, for example, to reduce the search space.) For anyone wanting to read up on the TSP problem, I would definitely recommend the survey paper The Traveling Salesman Problem, by J√ľnger, Reinelt and Rinaldi. Also, this website provides a great, gentle introduction.

Not too long after, it dawned on me that the problem was not ATSP at all, since vertices ("compounds") might be visited more than once. For example, consider a graph with the following edges:
    (C1, C2), (C2, C1)
    (C1, C3), (C3, C1)
This could not be solved by the ATSP, since we would have to visit C1 twice to complete a cycle from C1 to C2 to C3 and back to C1. However, it is required to be solvable for the Facebull puzzle. Back to the drawing board...

The general approach I took was similar to the one that would be taken for ATSP, though, and that is to use a "Branch and Bound" algorithm. This is a fancy way of saying, enumerate all possible solutions, backtracking at the earliest possible opportunity by computing lower and upper bounds. For example, we might start with an upper bound of infinity, and update it whenever we find a candidate solution with a lower cost than the upper bound. A lower bound can be computed by taking into a constraint: each vertex must have an in-degree of at least one, and an out-degree of at least one. By remembering the cost of each edge in the graph, we can easily compute a lower bound.

Is this enough? Perhaps, but we can do better. One more thing we can do is to prune edges. For example, if we have the edges (Ci, Cj), (Ci, Ck), and (Ck, Cj), and the sum of the cost of the latter two is less than the cost of the first, then we can eliminate the first.

In total, I submitted my solution nine times. Why so many, you might ask? Because I didn't parse the input correctly, and didn't realise this while I was banging my head against the problem for two weeks. I wasn't handling white space at the end of the file, and I (foolishly) assumed that the test cases would not assign multiple edges to the same pair of vertices. Why would you have such a trivial test case to trip people up, when the underlying puzzle is hard enough as it is? It turns out my accepted solution (written in C++) ran in 300ms, so I guess the focus was more or less centred on correctness, rather than speed. That was a little frustrating, but at least I had some fun and learnt a few tricks along the way.

There is still something wrong with my solution, however. It is not finding the optimal solution to some of the test cases people have published. But it passed the puzzle robot, so that's the main thing that matters. I had one more optimisation, which was to identify edges which are absolutely required, up-front. This can be done very cheaply, and may significantly reduce the search space in sparse graphs. This optimisation was interacting badly with my enumeration function, though; I'm not sure what the problem is, perhaps I'll go back and take another look some time.

Now, back to wasting my time playing computer games...