It's inherent to payment systems to "lose" money on a failed request. A transaction involves two sides, either both sides agree that the transaction is completed or money is lost/duplicated.
If the client decides to drop (timeout) and consider the transaction cancelled, while the server is processing it and will consider it done. That's a catastrophic issue that needs to be addressed. It is one of the most common bugs I've seen in the wild (root cause: too short timeouts).
How to make highly critical systems reliable enough in the face of hardware and software issues is a complex topic. At this level this involves a holistic approach to get every component to cooperate together (timeout is a minor example). A HUGE amount of work is to detect errors, and more importantly to propagate errors across diverse stacks (software should be aware of database errors, services should detect other services failing).
This doesn't make sense to me. If there is a real risk that a timeout can happen (and there always is) then the payment system should be implementing a two phase commit.
I don't know what the Byzantine generals problem is.
Two stage commit is important because it has:
1) Predefined transaction id prior to final submission that allows you to validate the status, so if your request to commit gets 503'ed or you get a timeout you can reliably query to know if it was processed or not
2) Unlimited resubmissions of the final commit. It doesn't matter if I perform the final commit api request 1 time or 100 times, it will never cause a duplicated transaction to occur. So if I get a timeout or a 503 I can resubmit knowing that if my original commit request went in my new submit will be a no-op, and if my last commit request didn't get processed then this time it hopefully will be processed.
This pattern isn't just a payments pattern thing either. This is heavily used in distributed systems where failures can occur. UPS' API used to use this as well so you could be sure that you don't pay for duplicate shipping labels or cause duplicate shippments.
The Byzantine general problem is the field of research dealing with consensus/consistency issues like what we discuss here. The baseline is that there are two generals on a battlefield trying to coordinate an attack, they send messengers to communicate but any message might be lost or intercepted. The problem is proven to be unsolvable so let's not go heads in assuming you can be sure of any outcome ;)
Taking longer does not solve the byzantine generals problem. The difference here is that the role are asymmetrical: once the bank receive your order for a transaction it does not need to check that you know whether the order was correctly received; the bank can simply perform the transaction and them best-effort let you know of what happened.
Isn't it better to make it idempotent? The risk is that the client might accidentally make the same transaction twice if the first attempt looks like it failed.
Make the client include the id of it's last known transaction and only apply the transaction if it's up to date, otherwise tell the client to refresh and try again.
The second stage is idempotent (which is why it works), but the purpose of the first stage is to make sure both sides have an agreed upon idea of the uniqueness of the transaction that's about to take place.
For instance, if I want to generate a shipping label that goes from my house to your house and I do two attempts, how does the receiving service know if I made two distinct attempts (I want to ship 2 similarly sized items) or if a transient error occurred in between making me attempt a re submission?
You solve this by creating an inactive request with the criteria (shipping label from my house to your house). This step is not idempotent but that's OK, because if I resubmit I just create a 2nd inactive request that may never actually be finished.
The second step is to say "this request is good and I want to proceed with it". That step is idempotent and marks the existing request as not just inactive but puts it in an active state.
A shopping cart flow is a user managed 2 stage commit (review your cart, submit the cart order). No matter how many times I submit my order it won't cause duplicate orders because I'm submitting a specific shopping cart.
UPS, Paypal, and others just use a computer/api-managed 2 stage commit
You can't always rely on a client generated ID, because you would have to know that the client id is unique enough. The server is the only one who can really generate a transaction id that it knows is globally unique and efficiently queryable in its backend.
It's not mutually exclusive. You can do two stage commits with the second stage being idempotent.
The practical risk is that this puts a ton of complexity on the client, to keep track of states and perform some follow up actions. The added complexity means more bugs and each additional step can fail hence compounding the problem rather than solving it.
If the client decides to drop (timeout) and consider the transaction cancelled, while the server is processing it and will consider it done. That's a catastrophic issue that needs to be addressed. It is one of the most common bugs I've seen in the wild (root cause: too short timeouts).
How to make highly critical systems reliable enough in the face of hardware and software issues is a complex topic. At this level this involves a holistic approach to get every component to cooperate together (timeout is a minor example). A HUGE amount of work is to detect errors, and more importantly to propagate errors across diverse stacks (software should be aware of database errors, services should detect other services failing).