Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This has been true for years, Intel CPU's can't efficiently perform math operations when hyperthreading is involved. Generally there is only a single shared FPU/AVX/SSE unit doing the math over two hyperthreads. Since the Eigen implementation often can keep that unit 100% busy, it makes no sense to try and run two threads at full tilt through the units.

I tested all this very heavily before Eigen had AVX-512 support. In that environment there might be some differences and I would suggest you benchmark both configurations.



> Generally there is only a single shared FPU/AVX/SSE unit doing the math over two hyperthreads.

Hyperthreading does in general share units (both ALU and others); that's what hyperthreading is.

Apart from that, it really depends on what operations you're doing; e.g., modern Intel CPUs have three ports that can issue a 256-bit FMA, each, every cycle.


>> Hyperthreading does in general share units (both ALU and others); that's what hyperthreading is.

Yes, I think the issue with Eigen is cache related. They apparently have optimizations that are aware of cache architecture and running 2 threads that share the same cache will screw that up, resulting in more misses. If this is the case, I'd prefer algorithms that are cache line size agnostic. It is still much faster than the simple hand-written code we had before!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: