I've seen a lexer/parser scheme that encodes the lexer token type along with the...

acidx · on May 14, 2024

This is roughly what my JSON parser does. It does type-checking, albeit without using JSON-schema, but an object descriptor that you have to define to parse and generate JSON.

It's been developed for embedded systems (it was written originally for a NATS implementation in the Zephyr RTOS), so it's a bit limited and there's no easy way to know where some parsing/type validation error happened, but the information is there if one wants to obtain it: https://github.com/lpereira/lwan/blob/master/src/samples/tec...

duped · on May 15, 2024

I think the key insight is that the true benchmark is bytes/second (bandwidth) of the lexer/parser, so reducing the size of the output data (tokens/AST nodes) is a massive gain in the amount of data that you can process in the same amount of time.

The fewer bytes you can pack data into the more data that you can process per second. Computers may be the most complex machines ever built but the simple fact of having fewer things to touch means you can touch more things in the same amount of time remains true.