Yes, only a few lockless algorithms require only a single atomic operation. And ...

dataflow · on April 24, 2023

Yeah. Now what about cases where the max contention is bounded (say, at most 2 threads waiting on a lock)? And what about factors other than running time? How does, say, fairness affect this picture?

gsliepen · on April 24, 2023

If multiple cores are trying to do atomic operations on the same value, the hardware typically is not fair. For mutexes, it depends on whether the operating system wakes up waiters in a fair way. Often that is not fully fair either.

If there are only two threads at most, you can sometimes use a different algorithm than if you have an arbitrary number of threads (for example, for single-producer, single-consumer queues, lockless is typically better than with a mutex).

You should of course benchmark your particular problem to see what solution is actually faster.