I've seen a lexer/parser scheme that encodes the lexer token type along with the token file location information into a u64 integer, something like
struct Token {
token_type: u8,
type_info: u8,
token_start: u32, // offset into the source file.
token_len: u16
}
It's blazing fast. The lexer/parser can process millions of lines per second. The textual information is included, and the location information is included.
This is roughly what my JSON parser does. It does type-checking, albeit without using JSON-schema, but an object descriptor that you have to define to parse and generate JSON.
It's been developed for embedded systems (it was written originally for a NATS implementation in the Zephyr RTOS), so it's a bit limited and there's no easy way to know where some parsing/type validation error happened, but the information is there if one wants to obtain it: https://github.com/lpereira/lwan/blob/master/src/samples/tec...
I think the key insight is that the true benchmark is bytes/second (bandwidth) of the lexer/parser, so reducing the size of the output data (tokens/AST nodes) is a massive gain in the amount of data that you can process in the same amount of time.
The fewer bytes you can pack data into the more data that you can process per second. Computers may be the most complex machines ever built but the simple fact of having fewer things to touch means you can touch more things in the same amount of time remains true.