I'd say branch mispredictions and cache misses are the two biggest factors to performance. Almost everything else is noise in comparison. I keep saying if your architecture doesn't account for these, you've wasted 90% of the CPU guaranteed.
I do the exact same techniques in my game engine (sadly proprietary) - even in JavaScript its dead easy to run circles around both Three.js and Babylon.js in terms of performance to the point I'm simply amazed at how fast JavaScript VMs can be if you really push them.
Eliminating branches, cache misses and allocations in the main loop yielded by far the biggest performance boost. (For the curious, we abused ArrayBuffers to guarantee contiguous memory allocations in JavaScript)
I do the exact same techniques in my game engine (sadly proprietary) - even in JavaScript its dead easy to run circles around both Three.js and Babylon.js in terms of performance to the point I'm simply amazed at how fast JavaScript VMs can be if you really push them.
Eliminating branches, cache misses and allocations in the main loop yielded by far the biggest performance boost. (For the curious, we abused ArrayBuffers to guarantee contiguous memory allocations in JavaScript)