However, if I'm not mistaken, branches are costly only when they're mispredicted. Hence, the question (as I see it) is wether one can better predict the behavior of Linus' better version (which only has one conditional) or that presented above (where there's an early return condition followed by the while condition).
It seems to me that the version above (which has two conditionals but is not like Linus' worse version because it has an early return) should be similarly costly to predict. Consider the situations where either most of the calls are to remove the first element (A) or most of the calls aren't (B). Let's see what happens in these cases:
- On Linus' code, the while is evaluated the first time. In case A, the while is correctly predicted to exit the while, the swap is made and the function ends. In case B, where the while yields true many times before hitting our element, you should have to pay the cost of a misprediction when you finally find it.
- In the early return code above, the situation is... the same. In case A the early return if is predicted correctly to yield true, and the function ends. In case B, the early return if is correctly predicted to yield false and then the while predicts mostly trues until you finally hit your element, time at which you pay the same misprediction penalty than above.
So... aren't both functions similarly expensive in terms of branch misprediction costs? Am I fundamentally wrong on my understanding on this issue?
Where is that early return you mention? I don't see the return keyword used in either examples. They both do the exact same while loop, but Linus' version adds a level of indirection to save a conditional later on. You're still touching the same memory in both cases so they should be identical in terms of cache misses.
From my understanding, the first version has two possible branch misprediction points while Linus' version only has one. This will probably only have a visible impact if the function is called in a loop however.
But to me the biggest advantage of the 2nd variant is its simplicity. This is the only way to stay sane with a growing codebase and keep shipping robust code.
Its no use getting a fast function if you can't integrate it optimally with the rest of the codebase. Raw performance isn't at the micro level but the macro one. This is where simplicity becomes critically important.
However, if I'm not mistaken, branches are costly only when they're mispredicted. Hence, the question (as I see it) is wether one can better predict the behavior of Linus' better version (which only has one conditional) or that presented above (where there's an early return condition followed by the while condition).
It seems to me that the version above (which has two conditionals but is not like Linus' worse version because it has an early return) should be similarly costly to predict. Consider the situations where either most of the calls are to remove the first element (A) or most of the calls aren't (B). Let's see what happens in these cases:
- On Linus' code, the while is evaluated the first time. In case A, the while is correctly predicted to exit the while, the swap is made and the function ends. In case B, where the while yields true many times before hitting our element, you should have to pay the cost of a misprediction when you finally find it.
- In the early return code above, the situation is... the same. In case A the early return if is predicted correctly to yield true, and the function ends. In case B, the early return if is correctly predicted to yield false and then the while predicts mostly trues until you finally hit your element, time at which you pay the same misprediction penalty than above.
So... aren't both functions similarly expensive in terms of branch misprediction costs? Am I fundamentally wrong on my understanding on this issue?
Thanks for the explanation!