Recently, I've been trying out a few of the options out there.
My requirements are:
* generates C
* generates a fast parser
Ease of use is also obviously a good thing. A few of the options I explored:
ANTLR[1]:
ANTLR is really great is you're using it for Java. Since Java is the first officially supported target, everything works really well there. The C bindings are pretty good too, though the documentation is pretty sparse and somewhat out of date (the examples are good, though).
ANTLR4 looks really exciting. It uses a new algorithm called adaptive LL(). To quote the author,
The benefit is that the adaptive algorithm is much stronger than the static LL(*)
grammar analysis algorithm in v3. Honey Badger takes any grammar that you give it; it
just doesn't give a damn. (v4 accepts even left recursive grammars, except for indirectly
left recursive grammars where x calls y which calls x).
When I was looking at it, ANTLR4 was not released and the C bindings hadn't been updated. It looks like at least half of that changed in the last couple of months.
Finally, ANTLRworks is awesome. You can graphically step through the parser and it even draws great diagrams. It's far and away the best debugging experience I've had.
Ometa[2]:
I looked briefly at Ometa - the work done on PEGs is really neat. Unfortunately there's not a viable C backend and overall the project is fairly undocumented.
Flex/Bison:
The old reliable. It's a bit harder to write your grammar for a LALR generator, but it outputs good C code and is supported pretty much everywhere. The resulting code is harder to debug than the code generated by ANTLR, though.
Flex/Lemon[3]:
This is where I ended up. Lemon is the parser written for sqlite. It's very similar to Bison but has some great new features and is a bit easier to use and debug. It's also an extremely simple project (1 .c file and 1 template file covers everything), and is extremely well tested (it's been used by sqlite for a long time).
I'm not perfectly happy with my current solution - in an ideal world there would be an LL() or PEG style parser generator that had first class support for C as a target language.
My requirements are:
* generates C
* generates a fast parser
Ease of use is also obviously a good thing. A few of the options I explored:
ANTLR[1]:
ANTLR is really great is you're using it for Java. Since Java is the first officially supported target, everything works really well there. The C bindings are pretty good too, though the documentation is pretty sparse and somewhat out of date (the examples are good, though).
ANTLR4 looks really exciting. It uses a new algorithm called adaptive LL(). To quote the author,
When I was looking at it, ANTLR4 was not released and the C bindings hadn't been updated. It looks like at least half of that changed in the last couple of months.Finally, ANTLRworks is awesome. You can graphically step through the parser and it even draws great diagrams. It's far and away the best debugging experience I've had.
Ometa[2]:
I looked briefly at Ometa - the work done on PEGs is really neat. Unfortunately there's not a viable C backend and overall the project is fairly undocumented.
Flex/Bison:
The old reliable. It's a bit harder to write your grammar for a LALR generator, but it outputs good C code and is supported pretty much everywhere. The resulting code is harder to debug than the code generated by ANTLR, though.
Flex/Lemon[3]:
This is where I ended up. Lemon is the parser written for sqlite. It's very similar to Bison but has some great new features and is a bit easier to use and debug. It's also an extremely simple project (1 .c file and 1 template file covers everything), and is extremely well tested (it's been used by sqlite for a long time).
I'm not perfectly happy with my current solution - in an ideal world there would be an LL() or PEG style parser generator that had first class support for C as a target language.
[1]http://www.antlr.org/ [2]http://tinlizzie.org/ometa/ [3]http://www.hwaci.com/sw/lemon/lemon.html