Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I agree strongly with the point about how making parsing a major component of teaching compilers being misguided (as anyone who read my compiler series would know), not because learning parsing is unnecessary but because going more in depth about the rest is sacrificed in favour of what is one of the least interesting parts of the challenge of writing a new language.

There's more to do in improving parsing, but it's still one of the best understood aspects of parsing.

That said, I do agree with the article that it's a problem that many syntaxes have complex parser requirements. No grammar I know of requires recursive descent, but many requires violations of layering etc. not catered for by tooling that strongly favours manual parsers.

Now, I prefer manually written parsers because I think (having tried and failed to write better options many times) that current parser generation tools are rarely suitable for writing production parsers.

That said, I wish people wrote their grammars for a tool as a validation exercise, to give an incentive to design grammars that are more regular and easier to parse and simpler.

As an example, I love Ruby. Having worked on my Ruby compiler project (though it's been in hibernation for a few years), I utterly hate Ruby's grammar. I don't hate the result, for the most part, but mostly because most of us avoid the more insane things Ruby allows [1]

And while some of the complexity are part creating what I love about Ruby, the grammar is full of horrible quirks (again [1]) at least some of which might have been reduced if one of the "test cases" of the design process for Ruby was to maintain and update a compact language description for a parser generator tool. Even if that'd not be the production grammar for MRI, using it both to maintain a sane compact grammar description and to validate other parsers against would be great.

I pick on Ruby because I love it and wish it was cleaner and it's a particularly bad offender in this respect, but so many languages have occasional warts that they likely wouldn't have if a formal specification of the grammar made them more obvious.

[1] E.g. 'p(% % )' is valid (it prints "%", because % without a left hand operand reads the following character as the start of a quoted string ending with the same character. 'p % % ' is not, because the "p" can be parsed as an operand, and so the first '%' is interpreted as an infix operator, while the second '%' is interpreted as the start of a quoted character. Fun times. Thankfully I've never seen anyone abuse this in real code.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: