This has been true for years, Intel CPU's can't efficiently perform math operati...

Sesse__ · on April 4, 2022

> Generally there is only a single shared FPU/AVX/SSE unit doing the math over two hyperthreads.

Hyperthreading does in general share units (both ALU and others); that's what hyperthreading is.

Apart from that, it really depends on what operations you're doing; e.g., modern Intel CPUs have three ports that can issue a 256-bit FMA, each, every cycle.

phkahler · on April 4, 2022

>> Hyperthreading does in general share units (both ALU and others); that's what hyperthreading is.

Yes, I think the issue with Eigen is cache related. They apparently have optimizations that are aware of cache architecture and running 2 threads that share the same cache will screw that up, resulting in more misses. If this is the case, I'd prefer algorithms that are cache line size agnostic. It is still much faster than the simple hand-written code we had before!