The goal of lowering to binary at 0.1 million lines/sec isn't very ambitious. Virgil bootstraps its 59,000 line compiler in 400 milliseconds, which is over that number, and it does whole program reachability and optimization.
On the other hand, the 10 million lines/sec parsing is probably not achievable. Thats 400MB/s. The fastest JavaScript parser I know of, V8's, is on the order of 60-80MB/s. Virgil's is about 45MB/s. Maybe if you are parsing Lisp you can get to 200MB/s, but no curly-braced language with actual syntax is going to parse that fast.
And you don't have to parse 10X faster than semantic analysis. In my experience, semantic analysis is only about 2X as expensive as parsing, so that 1 million line/sec goal for semantic analysis is easily achievable. (Virgil is at about 800KLoc/sec).
Keep in mind the above is all just one core. 10 and 20 core CPUs are becoming widespread, and these two phases parallelize nicely. Just don't do C's stupid O(n^2) header madness and Carbon will compile plenty fast.
Header files in C/C++ boil down to textual inclusion. They generally end up with such complicated directive logic that a compiler can't reasonably pre-compile or even pre-parse them and get the semantics right. Large projects tend to scale up by having more and more compilation units and headers. Eventually, textual inclusion leads to an O(n^2) explosion in the amount of work the compiler has to do, and generates object files that are O(n^2) big, most of which is then de-duped by the linker. For example, compiling V8 generates a couple gigabytes of .o files that get linked into ~30MB binary in the end. It all starts with header madness.
Google had experience in designing languages so they can be parsed quickly, that's one of Go's design goals. Javascript, on the other hand, is made fast out of necessity, not out of design. I wouldn't take a Javascript parser to be the be-all and end-all of parser performance. If Google has applied any of the lessons learned from Go, they could easily exceed V8, even with curly braces.
> In my experience, semantic analysis is only about 2X as expensive as parsing
Can you elaborate? Semantic analysis might cover anything from simple type checking to full program data flow analysis, and I understand the latter can be very expensive.
On the other hand, the 10 million lines/sec parsing is probably not achievable. Thats 400MB/s. The fastest JavaScript parser I know of, V8's, is on the order of 60-80MB/s. Virgil's is about 45MB/s. Maybe if you are parsing Lisp you can get to 200MB/s, but no curly-braced language with actual syntax is going to parse that fast.
And you don't have to parse 10X faster than semantic analysis. In my experience, semantic analysis is only about 2X as expensive as parsing, so that 1 million line/sec goal for semantic analysis is easily achievable. (Virgil is at about 800KLoc/sec).
Keep in mind the above is all just one core. 10 and 20 core CPUs are becoming widespread, and these two phases parallelize nicely. Just don't do C's stupid O(n^2) header madness and Carbon will compile plenty fast.