Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, only a few lockless algorithms require only a single atomic operation. And as soon as you need to do cmpxchg in a loop and there is contention, you can burn a lot of cycles while cache lines bounce between cores, potentially an unbounded number of cycles if it is not a wait-free algorithm. While a mutex would have to pay the price of a futex call in the contended case, it is at least bounded.


Yeah. Now what about cases where the max contention is bounded (say, at most 2 threads waiting on a lock)? And what about factors other than running time? How does, say, fairness affect this picture?


If multiple cores are trying to do atomic operations on the same value, the hardware typically is not fair. For mutexes, it depends on whether the operating system wakes up waiters in a fair way. Often that is not fully fair either.

If there are only two threads at most, you can sometimes use a different algorithm than if you have an arbitrary number of threads (for example, for single-producer, single-consumer queues, lockless is typically better than with a mutex).

You should of course benchmark your particular problem to see what solution is actually faster.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: