Rate limiting or clamping "0" delays upwards can also help for polling applicati...

Rate limiting or clamping "0" delays upwards can also help for polling applications where you want to be as fast as possible, but you're never going to be actually zero. For example if you have a thread that's pulling images from a camera (publishing at 30 fps) in a loop, you can delay for 1ms between polls and nobody is going to know, but that can have an enormous impact on CPU usage; especially on embedded platforms. Instead of running at 100% thread usage (calling camera.read() and waiting for it to return true), the spin/loop cost is basically free and you only need to worry about the cost of acquiring/decoding the frame. In theory branch prediction should help because you'll take the "no image yet" branch 99% of the time, but in practice that just makes the loop iterate faster. I learned this the hard way writing a custom camera publisher in ROS, but I've seen it a lot in tutorial code and I think a lot of beginners make that mistake.