Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If lex/yacc style parsing works for you, then great. However, I suspect most people are going to get more mileage out of just hand writing a recursive descent parser and moving on with their lives.

The benefit of recursive descent is that they're easy to write and modify and understand. You don't need any new paradigms, just write code like you typically do. If something goes wrong, your standard debugging skills will serve you well.

There's also a lot of other relatively easy parsing technologies out there. For example, you can also consider monadic parsing, parser combinators, PEG libraries.

I spent a year trying to figure out which parser technique worked best for me, and I'm glad I didn't just stick with my starting point of lex/yacc. So again, if this guide allows parsing to just work for you, then great stick with it. But if you find yourself encountering a lot of problems, then it might be worth it to look around because other options exist and work just fine.



The problem of course is most people aren't going to deliberately produce an LL grammar. It's very easy to take your language into LR territory by accident and if you've centered your features around a particular LR-able process, converting to LL may be extremely hard or practically impossible.


LL can be augmented by backtracking approaches: parse it that way and if it doesn't work out, do a long return to such and such point, and try it another way.

It hasn't been historically popular because it's expensive. Particularly, when you do several such searches in a nested fashion, the combinations can explode.

That might not matter on today's hardware.

The fix against LR features creeping in is not to write a grammar. Just write the code. If you have some new idea, code it in the recursive descent parser. There is a risk that by doing that, you break something which used to parse (because you hijacked the prefix by which it was recognized). You have to have a test case for that; don't add anything to the parser without tests.

LL(1) or LL(k) parsing relies by identifying the construct by looking at a few prefix tokens of the input. So when you extend the language, you have to think about where in that prefix space you stick the new features.

If you make the language a Lisp, you have a set blueprint for how to avoid LR(1). ANSI Common Lisp defines the actual algorithm by which code is parsed, because a user-reprogrammable part of the run-time. The reader follows a read table, which dispatches functions based on what character, or in some cases what pair of characters, is seen. There are subtleties: some characters are identified as token constituents while some are terminating (if they are seen in the middle of a token, that token is delimited, and that character is not part of the token). For instance, the ' character, in the standard read table, is tied to a function which will read the following object X and produce the object (quote x). The character is a terminating character, so that when we have abcd', an abcd symbol token will be recognized and come to an end because of the '. Then the function for ' will be dispatched to read the quote.


In case you're like me and rusty on LL grammars vs LR, I found this quick refresher:

https://blog.reverberate.org/2013/07/ll-and-lr-parsing-demys...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: