If lex/yacc style parsing works for you, then great. However, I suspect most peo...

zer8k · on Aug 29, 2023

The problem of course is most people aren't going to deliberately produce an LL grammar. It's very easy to take your language into LR territory by accident and if you've centered your features around a particular LR-able process, converting to LL may be extremely hard or practically impossible.

kazinator · on Aug 30, 2023

LL can be augmented by backtracking approaches: parse it that way and if it doesn't work out, do a long return to such and such point, and try it another way.

It hasn't been historically popular because it's expensive. Particularly, when you do several such searches in a nested fashion, the combinations can explode.

That might not matter on today's hardware.

The fix against LR features creeping in is not to write a grammar. Just write the code. If you have some new idea, code it in the recursive descent parser. There is a risk that by doing that, you break something which used to parse (because you hijacked the prefix by which it was recognized). You have to have a test case for that; don't add anything to the parser without tests.

LL(1) or LL(k) parsing relies by identifying the construct by looking at a few prefix tokens of the input. So when you extend the language, you have to think about where in that prefix space you stick the new features.

If you make the language a Lisp, you have a set blueprint for how to avoid LR(1). ANSI Common Lisp defines the actual algorithm by which code is parsed, because a user-reprogrammable part of the run-time. The reader follows a read table, which dispatches functions based on what character, or in some cases what pair of characters, is seen. There are subtleties: some characters are identified as token constituents while some are terminating (if they are seen in the middle of a token, that token is delimited, and that character is not part of the token). For instance, the ' character, in the standard read table, is tied to a function which will read the following object X and produce the object (quote x). The character is a terminating character, so that when we have abcd', an abcd symbol token will be recognized and come to an end because of the '. Then the function for ' will be dispatched to read the quote.

metadat · on Aug 29, 2023

In case you're like me and rusty on LL grammars vs LR, I found this quick refresher:

https://blog.reverberate.org/2013/07/ll-and-lr-parsing-demys...