Counterpoint: always have another way to cancel the operation.

RhysU · on Aug 29, 2020

Not retrying is implicitly a way to cancel. Also, it is a codepath that's trivially tested vs the effort to test an extra cancel path.

asdfasgasdgasdg · on Aug 29, 2020

I'm not sure how testing timeouts is trivial compared to cancellation. They both take about the same amount of code to write a test for, IME. (Not much.)

Not retrying+timeouts has similar effects to cancellation. The operation ceases to go forward. But it is not the same. It's a lot more expensive than imperative cancellation (need to rebuild, resend, reparse the request) and it has a lot of production risks that waiting with cancellation doesn't. For example, naive retries can expose backends to thundering herds, and less naive retries can have strange issues caused by exponential backoff where you'll have requests sitting around doing nothing for half their own timeout, before giving up because the next retry did not hit before the end of the parent request's timeout.

RhysU · on Aug 29, 2020

All good points. By trivial I meant 2 tests (works/fails) vs 3+ (works/fails/cancel with the latter possibility having its own works/fails cases). A timeout is just a status code on failure.

marcosdumay · on Aug 29, 2020

Yet, if you are making an interactive application, that easy to test codepath is a great way to put bugs into your requirements.

orisho · on Aug 29, 2020

In Go, it is a single code path. Contexts can be canceled and they also come with propagating timeouts. The timeouts simply trigger a cancellation, so the only code path is handling cancellation.

There's nothing complicated about it, so there's no reason your code can't implement timeouts and cancellation the same way: timeouts are a cancellation triggered autonomously after some time passes.

RhysU · on Aug 29, 2020

Not disagreeing, but having trouble seeing it. Could you elaborate?

marcosdumay · on Aug 29, 2020

By adding that timeout you just created a user-visible behavior that nobody asked for and people will only notice in production while dealing with the most complicated use-cases.

RhysU · on Aug 29, 2020

Nobody asks for it but some choice must be made. As a user, I have often cursed things that hang indefinitely. And I don't trust application state after touching a cancel button. That stuff is seldom tested well.

megous · on Aug 29, 2020

Well, not always possible. For example the latest systemd has a bug where it sometimes deadlocks in a PAM module, so it blocks all remote access to a machine over ssh (openssh uses PAM, optionally). If openssh had a timeout on the PAM child process, it would simply retry after timeout, instead the whole machine is lost and needs to be restarted with physical access.

There's no way to cancel the operation remotely, because you're not authenticated yet. And you may not have any other access.

Timeouts are also a good defense strategy against bugs.

magicalhippo · on Aug 29, 2020

This requires the API you're using to support this. If the API doesn't, then using infinite timeout is a bad idea.

asdfasgasdgasdg · on Aug 29, 2020

Of course. I did not mean to convey that timeouts should be avoided in all cases. In fact I listed several such cases where they should be used. An API that has no way to cancel would be another example. Although I would argue that such an API is fundamentally flawed.

magicalhippo · on Aug 29, 2020

Yeah I just wanted to highlight it, as I've see far too much code passing INFINITE to WaitForSingleObject or similar.

And yeah, not having another way of cancelling is not nice, but sadly not entirely uncommon.

asdfasgasdgasdg · on Aug 29, 2020

Right I think the suggestion in that case would be to upgrade to an API that does support cancellation wherever possible. E.g. wait for multiple objects with the original argument and an additional cancel event.