Thursday, August 4, 2011

It's Alive!


I've been quietly hacking away on cmonster (née csnake). I like this name even more: I think it describes my creation better. If you thought preprocessors were horrible before, well...


What is cmonster?

cmonster is a C preprocessor with a few novel features on top of the standard fare:

  • Allows users to define function macros in Python, inline.
  • Allows users to define a callback for locating #include files, when the file can not be found in the specified include directories.
  • Provides a Python API for iterating over tokens in the output stream.
cmonster is built on top of Clang, a modern C language family compiler, which contains a reusable, programmable preprocessor. At present, cmonster requires Clang 3.0 APIs, which has not yet been released. I have been working off Clang's subversion trunk.

I have just uploaded a binary distribution of the first alpha version (0.1) of cmonster to pypi. I have only built/tested it on Linux 32-bit, Python 3.2, and I don't expect it will work on anything else yet. If you want to play around with it, you can install cmonster using "easy_install cmonster" or by grabbing it off pypi and installing it manually.


Demonstration

Seeing is believing - how does this thing work? We'll ignore everying except for inline Python macros for now, because that's the most stable part of the API.


It is possible to define macros inline in cmonster, using the builtin "py_def" macro. For example:

py_def(factorial(n))
    import math
    return str(math.factorial(int(str(n))))
py_end

When cmonster sees this, it will grab everything up to "py_end", and define a Python function. It will also create a preprocessor macro with the function's name (as given in the py_def heading), and this macro will be directed to call the Python function. The Python function will be passed the argument tokens that the macro was invoked with. It can return either a sequence of tokens, or a string that will subsequently be tokenised.


Addressing Shortcomings

In my previous post about csnake I mentioned a few things that needed to be done. I have addressed some of these things in cmonster:

A way of detecting (or at least configuring) pre-defined macros and include paths for a target compiler/preprocessor. A standalone C preprocessor isn't worth much. It needs to act like or delegate to a real preprocessor, such as GCC.
 I have added support for emulating GCC. This is done by consulting GCC for its predefined macros (using "gcc -dM"), and using the new "include locator" callback feature. By setting an include locator callback on a cmonster preprocessor, you provide cmonster with a second-chance attempt at locating an include file when the file can not be found in the specified include directories. This method can be used to determine GCC's system include directories: whenever we can't find an include file, we consult GCC and add parse the output to determine the location of the file on disk. I intend to add support for more (common) compilers/preprocessors in the future. Namely, MSVC.

I had another crazy idea for (ab)using include locators: handling specially formatted #include directives to feed off-disk files into the preprocessor. Buh? I mean, say, the ability to pull down headers from a hosted repository such as github (e.g. #include <github.yourproject/header.h>), and feeding them into the preprocessor as in-memory files. Or generating headers on the fly (e.g. #include <myapi.proto>, to automatically generate and include Protobuf headers).

A #pragma to define Python macros in source, or perhaps if I'm feeling adventurous, something like #pydefine.
Support for inline Python macros has been implemented: see the "Demonstration" section above. It's unlikely I'll attempt to create a #pydefine, as it would be more work than it's worth.


A simple command line interface with the look and feel of a standard C preprocessor
The distribution now contains a "cmonster" script, which invokes the preprocessor on a file specified on the command-line. This will need a lot of work: presently you can't add user include directories or (un)define macros. Neither of these things are difficult to add, they just haven't been top priorities.


Future Work

Still remaining to do (and sticking out like sore thumbs) are unit-tests and documentation. Now that I've got something vaguely usable, I will be working on those next.

Once I've tested, documented and stabilised the API, I'll look at (in no definite order):
  • Improvement the command line interface. Add the standard "-I", "-D" and "-U" parameters.
  • Portability. Probably Windows first, since it's common and I have access to it.
  • Emulation of additional preprocessors.
  • Passing in-memory files ("file like" Python objects) to #include.