(author here) > The problem is 95% about laying out the instruction dispatching ...

vkazanov · 2025-03-10T16:27:47 1741624067

Yes, this was already becoming true around the time I was writing the linked article. And I also read the paper. :-) I also remember I had access to a pre-Haswell era Intel CPUs vs something a bit more recent, and could see that the more complicated dispatcher no longer made as much sense.

Conclusion: the rise of popular interpreter-based languages lead to CPUs with smarter branch predictors.

What's interesting is that a token threaded interpreter dominated my benchmark (https://github.com/vkazanov/bytecode-interpreters-post/blob/...).

This trick is meant to simplify dispatching logic and also spread branches in the code a bit.

celeritascelery · 2025-03-10T19:47:48 1741636068

How do you reconcile that with the observation that moving to a computed goto style provides better codegen in zig[1]? They make the claim that using their “labeled switch” (which is essentially computed goto) allows you to have multiple branches which improves branch predictor performance. They even get a 13% speedup in their parser from switch to this style. If modern CPU’s are good at predicting through a single branch, I wouldn’t expect this feature to make any difference.

[1] https://ziglang.org/download/0.14.0/release-notes.html#Code-...

dwattttt · 2025-03-10T21:21:01 1741641661

While it's unlikely as neat as this, the blog post we're all commenting on is a "I thought we had a 10-15% speedup, but it turned out to be an LLVM optimisation misbehaving". And Zig (for now) uses LLVM for optimised builds too