It sounds like someone read Russ Cox's work on regexp.
Russ Cox? You know that regular expressions have existed since Kleene's first proposals, and everyone with a CS degrees knows that you can compile them to deterministic finite state automata. Right?
Sadly not everybody is familiar with Automata theory - furthermore, I'm sure he knows and his link is about implementation and is none trivial and none obvious.
I'm struggling to understand your negativity here..
I'm struggling to understand your negativity here..
Sorry, that was indeed a bit harsh. I wanted to point out that Bram Molenaar is probably well-aware of DFAs given how well-known they are and how many implementations there are available. I don't think Russ Cox has much too do with it and rather that it is driven by demand - linear time matching is attractive for syntax highlighting.
You know that he didn't say Cox invented regular expressions, but that the Vim implementation might be influenced by Cox's advocacy on the matter. Right?
There are many implementations the standard operations on finite state automata (determinization, minimization, intersection, concatenation, etc.) and nearly as many implementations of regular expressions[1]. Anyone who is in the trade (of text processing), including Bram Molenaar, will be aware that regular languages can be represented as finite state automata.
However, the problem is that not all regular expressions 'dialects' (most notably Perl regexes) are not regular languages and cannot be represented as a finite state automata. For instance, the introduction of back-references makes regular expressions non-regular.
So, it's not so much being enlightened by Russ Cox, but a trade-off between the convenience of some of the Perl extensions and the ability to apply matching in linear time. Also, for complex expressions, constructing the automaton can be slow.
The basic implementation methods are well-known and quasi-standard, but Cox did make some useful practical additions, in addition to writing up a clear and accessible exposition.
The classic DFA engines, like in Thompson's version of grep, provide a fairly minimal regex language (later standardized as POSIX regexes), not much fancier than the regular-language syntax used in textbooks. Perl-style regular expressions get such widespread usage even outside of Perl in part because they have a more full-featured syntax, which provides conveniences people find useful. Cox implemented a significant subset of this functionality as syntactic sugar on top of a classic automaton regex engine, so RE2, unlike POSIX regexes, supports named matches, named character classes like \d (plus Unicode character classes), non-greedy matches, etc. Basically everything that can be done without making the language nonregular, so excluding lookahead, lookbehind, and backreferences.
Since vim regexes (a custom dialect) support some of those supra-POSIX features, those implementation techniques were likely useful for them to look at. Indeed they link to Cox's article somewhere or other in the design docs.
Vim does have lookahead/lookbehind syntax, though; I wonder how they supported that. Fallback to the backtracking engine?
I assume a degree with that title would lean towards the formal side of linguistics, which means you would certainly cover things like the Chomsky hierarchy in detail; after all linguistics is where the Chomsky hierarchy (though not regular languages) originated.
In this interview[1] he states that his degree is one that he designed himself (which his college allowed), and that he used his programming classes and experience as the second of two foreign languages he needed for his Linguistics degree (the first being Classical Greek). He calls this a, "hack".
He talks about schooling after his undergrad, but doesn't go into what that was.
Russ Cox? You know that regular expressions have existed since Kleene's first proposals, and everyone with a CS degrees knows that you can compile them to deterministic finite state automata. Right?