doesn't necessarily correspond to one branch at the ASM instruction level right? in fact, a priori, it doesn't even correspond to absolutely any branches at the ASM instruction level.
Yes I do and that’s neither here nor there. There almost certainly ASM instruction level branches to implement the conditional and the branch predictor isn’t tied to a single instruction - the rough high level mental model is a cache of a few of the least significant digits of the CPU to the prediction although in practice it’s far more complicated. Since the predictor is right like 80% of the time, it means that even when there’s false sharing of a lot of branches, the CPU does a good job predicting. It’s performance is primarily impacted when execution takes both branches closer to 50/50 than to 100/0.
That’s why I asked about false sharing specifically as that’s the main way I can think of that Python code wouldn’t be able to observe the branch predictor performance - because the interpreter has so many branches internally that it dominates any branches that might be caused by the Python code itself.
It corresponds to a way more than one branch at instruction level. The branch prediction AFAIK does not care based on what are you branching, it just assumes branches will go in similar sequences as they did last time. If the Python 'if' is never taken, the instruction-level predictor will learn that after the comparison operation, there is an 'if' operation and then another array access operation. If the Python 'if' is unpredictable unpredictable, CPU predictor can't be sure which opcode are we processing after 'if', so there will be penalty.
Is there any public documentation on modern branch prediction algorithms? I know branch prediction is very important to modern CPU so SOTA techniques are probably not public... But it's really amazing what it can do especially considering the "limited" cache sizes that branch predictors have .
I guess it depends on how deep you want to go, I think the real predictors are based on publicly known algorithms such as TAGE. This seems to be nice overview: https://comparch.net/2013/06/30/why-tage-is-the-best/ (it's 2013, so definitely not SOTA, but advanced enough for my taste :) )