Tuesday, June 21, 2011

Le sigh

I've been coming across more problems with Boost Wave. The current ones blocking me are:
  • A lack of an efficient way to conditionally disable a macro. The "context policy" provides hooks for handling macro expansions, and its return value is meant to control whether the expansion takes place. It doesn't work. I'll write up a bug report when I get some time.
  • Wave isn't very forgiving about integer under/overflow. For example, the GNU C library's header "/usr/include/bits/wchar.h" has the following tidbit to determine the sign of wide characters, which Boost Wave barfs on:
#elif L'\0' - 1 > 0

I think the latter "problem" might actually be reasonable - I believe the standards say that handling of overflow is undefined, and preprocessor/compiler-specific. That doesn't help me much though. I could fix this by writing code to parse the expressions, which seems silly, or by passing off to the target preprocessor (e.g. GCC), which seems like overkill.

I'm going to have a look at how hard it would be to use LLVM/Clang's preprocessor instead. If that's a better bet, I may go that way. Otherwise, it might be time to get approval to send patches to the Wave project.

Saturday, June 18, 2011

Inline Python macros in C

I had a bit of time today to do some more work on that which is soon to be called something other than csnake. I've added a couple of features:

  • You can now define custom pragmas, providing a Python handler function. Unfortunately Boost Wave, which csnake uses for preprocessing, only provides callbacks for pragmas that start with "#pragma wave".
  • Built-in pragmas and macros to support defining Python macros inline in C/C++ code.
  • A __main__ program in the csnake package. So you can now do "python3.2 -m csnake ", which will print out the preprocessed source.
So for example, you can do something like as follows, entirely in one C++ file:

// factorial macro.
    import math
    f = math.factorial(int(str(n)))
    return [Token(T_INTLIT, f)]

int main()
    std::cout << factorial(3) << std::endl;
    return 0;

This works as follows: py_def and py_end are macros, which in turn use the _Pragma operator with built-in pragmas. Those pragmas are handled by csnake, and signal to collect the tokens in between. When the py_end macro is reached, the tokens are concatenated and a Python function macro is born.

I'm intending to do additonal "Python blocks", including at least a py_for block, which will replicate the tokens within the block for each iteration of a loop.

There's one big problem with the py_def support at the moment, which is that the tokens go through the normal macro replacement procedure. I think I'll have a fix for that soon.

Wednesday, June 15, 2011

Name clash

Rats, looks like I didn't do enough homework on the name "csnake". Turns out there's another Python-based project called CSnake: https://github.com/csnake-org/CSnake/. Incidentally - and not that it matters much - I had the name picked out before that project was released. Now I need to think up another puntastic name.

Saturday, June 11, 2011

C-Preprocessor Macros in Python

TL;DR: I've started a new project, csnake, which allows you to write your C preprocessor macros in Python.

Long version ahead...

You want to do what now?

I had this silly idea a couple of years ago, to create a C preprocessor in which macros can be defined in Python. This was borne out of me getting sick of hacking build scripts to generate code from data, but pursued more for fun.

I started playing around with Boost Wave, which is a "Standards conformant, and highly configurable implementation of the mandated C99/C++ preprocessor functionality packed behind an easy to use iterator interface". With a little skulduggery coding, I managed to define macros as C++ callable objects taking and returning tokens. Then it was a simple matter of adding a Python API.

The Result

What we end up with is a Python API that looks something like this:

import sys
from _preprocessor import *
def factorial(n):
    import math
    return [Token(T_INTLIT, math.factorial(int(str(n))))]

p = Preprocessor("test.cpp")
for t in p.preprocess():

Which will take...

int main(){return factorial(3);}

And give you...

int main(){return 6;}

If it's not immediately clear, it will translate "factorial()" into an integer literal of the factorial of the input token. This isn't a very interesting example, so if you can imagine a useful application, let me know ;)

The above script will work with the current code, using Python 3.2, compiling with GCC. If you'd like to play with it, grab the code from the csnake github repository. Once you've got it, run "python3.2 setup.py build". Currently there is just an extension module ("csnake._preprocessor"), so set your PYTHONPATH to the build directory and play with that directly.

I have chosen to make csnake Python 3.2+ only, for a couple of major reasons:

  • All the cool kids are doing it: it's the way of the future. But seriously, Python 3.x needs more projects to become more mainstream.
  • Python 3.2 implements PEP 384, which allows extension modules to be used across Python versions. Finally. I always hated that I had to recompile for each version.
... and one very selfish (but minor) reason: I wanted to modernise my Python knowledge. I've been ignoring Python 3.x for far too long.

The Road Ahead

What I've done so far is very far from complete, and not immediately useful. It may never be very useful. But if it is to be, it would require at least:
  • A way of detecting (or at least configuring) pre-defined macros and include paths for a target compiler/preprocessor. A standalone C preprocessor isn't worth much. It needs to act like or delegate to a real preprocessor, such as GCC.
  • A #pragma to define Python macros in source, or perhaps if I'm feeling adventurous, something like #pydefine.
  • A simple, documented Python API.
  • A simple command line interface with the look and feel of a standard C preprocessor.
  • Some unit tests.
I hope to add these in the near future. I've had code working for the first two points, and the remaining points are relatively simple. I will post again when I have made some significant progress.