Practical Guide to SQL Transaction Isolation

IgorPartola · on Aug 2, 2017

One tool to add to your toolbox as a developer is named locks. Say you have a table for orders and a table for order items and a table for payments. Collectively they represent an order, but that is a higher level concept that isn't something an RDBMS understands. When the user on your site simultaneously sends two updates, such as "remove item from the cart" and "pay and finish", you will have a slew of simultaneous updates to the rows across the three tables. Unless you use the highest isolation level, you will get inconsistencies. If you do use serializable isolation level you might get a deadlock, depending on how conscientious you are about updating everything in the same order. You could do an initial read from the orders table with FOR UPDATE which will provide you an implicit lock on that order, but you still have issues with new rows being inserted into your order items table that will not be range locked.

So use names/advisory locks: create a named lock called "order-1234" where 1234 is tbe ID of the order. Acquire it before doing any manipulations to the order across all its tables. Then release it. Semantics of names locks differ across RDBMS's. Unsurprisingly Postgres has saner and more flexible semantics than MySQL, but both are perfectly suitable for the purpose. This technique can save you a ton of headaches.

stickfigure · on Aug 2, 2017

If you do use serializable isolation level you might get a deadlock, depending on how conscientious you are about updating everything in the same order.

It sounds like you misunderstand optimistic concurrency. If your transaction touches a piece of data, and that data changes (by another transaction) while your transaction is in progress, then your transaction will rollback on commit. With a tiny bit of library code, your transaction will retry until success. In the case of severe contention you may see repeated retries and timeouts, but you will never see deadlocks - at least one transaction will successfully commit each "round".

I disagree - database locks should not be used by the average developer. Understand optimistic concurrency and use serializable isolation. If (and only if!) you have a known performance problem with highly contended data, consider a locking strategy. 99 out of 100 developers will never need it, and most of that 1% will botch locks on the first try anyways.

If your database doesn't support optimistic concurrency, get a better database.

IgorPartola · on Aug 2, 2017

I partially agree with what you are saying. I understand optimistic concurrency, especially the constraints it places on your code: you must in all cases access the data in the same order, otherwise you do get deadlocks. You also must implement the retry mechanism. I am not familiar with all the web frameworks out there (this is where I often encounter these types of issues), but I don't know of one that automatically retries requests for you. That "tiny bit of library code" is something that as far as I know you have to create for every situation, and it will be specific to your situation. I have yet to see a production codebase that actually does this instead of simply returning an error to the user saying "try again".

If your database doesn't fully support ACID transactions, yes, get a better DB. And no you don't need to use named locks all the time. As an example, I have a codebase where I use the standard transactions semantics everywhere using the Django ORM, except one very specific and critical bit of code where I want to be dead certain that a thing can't happen twice in a row. So basically out of, say, 100 or so places where I commit a transaction, only one actually uses named locks, but where it does, it solves the problem in a very elegant way.

So I agree with you that you shouldn't litter your code with these. But you absolutely can use them to either (a) guard against complex and critical parts of your code or (b) use these to quickly augment a huge existing codebase that keeps running into deadlocks or worse yet lock timeouts.

stickfigure · on Aug 2, 2017

I understand optimistic concurrency, especially the constraints it places on your code: you must in all cases access the data in the same order, otherwise you do get deadlocks.

The second half of this statement negates the first :-)

You have confused pessimistic concurrency with optimistic concurrency. An optimistic system detects collision at commit time and the actual order of operations in your transaction is irrelevant. You will never experience a deadlock, only aborts/retries.

In a pessimistic concurrency scenario, you take explicit locks as you access resources. Those locks block access by other transactions. This creates the dining philosophers problem and the risk of deadlocks. When locking resources, you must access resources in a consistent order to prevent deadlocks.

Please avoid database locks and let smart databases do what they are supposed to!

IgorPartola · on Aug 2, 2017

Learn something new every day. Yes you are correct, I meant pessimistic concurrency because that's what I'm familiar with. Which databases support optimistic concurrency and full ACID transactions?

stickfigure · on Aug 3, 2017

Postgres, MS SQL Server, Oracle... and more exotic ones like Google's Cloud Datastore and Spanner. Probably many more.

IgorPartola · on Aug 3, 2017

Interesting. I use Postgres and can't find any mention of this in the docs: https://www.postgresql.org/docs/current/static/mvcc.html talks about the usual row and table locks, and https://www.postgresql.org/search/?u=%2Fdocs%2F9.6%2F&q=Opti... is empty. Is it known by some other name in the Postgres land?

Also, I found this and similar on SO: https://stackoverflow.com/questions/17431338/optimistic-lock...

So if this technique actually requires me to write my database access code at this level, it means that using something like an ORM is nearly impossible in a way that makes an ORM useful.

stickfigure · on Aug 4, 2017

That stackoverflow answer is wrong - or at least, only applies to MySQL. The author is apparently unaware of other databases.

If you put Postgres in repeatable read or serializable[1] mode, it manages optimistic concurrency for you. Collisions will produce an error in the second transaction attempting to commit. The only "trick" is that your client code should be prepared to retry transactions that have concurrency failures, and this is very easily abstracted by a library. Your SQL or ORM code does not change (I use Hibernate, fwiw).

Some good reading: https://www.postgresql.org/docs/current/static/transaction-i...

[1] Repeatable Read won't detect collisions caused by phantom rows, so pick carefully. Most CRUD apps don't care about phantom rows.

IgorPartola · on Aug 4, 2017

How does this work if your transactions also do external things? So if have two concurrent transactions that charge a credit card via an external API, as well as mark the order as paid in your DB? Retrying that transaction would be disastrous, no?

stickfigure · on Aug 4, 2017

This is a problem that affects all distributed systems irrespective of database isolation levels. For example, your call to the external API might timeout; did the charge succeed or not? Database locking doesn't help you.

The solution comes in two parts:

1) Ensure your call to the card processor is idempotent and retry it. For example, grep Stripe's documentation for the "Idempotency-Key" header. Every processor should have a similar mechanism. Conveniently, if your database transaction unit-of-work includes the remote call, it will retry automatically in an optimistic scenario.

2) Get a webhook call from the credit card processor. Even though you are retrying your transaction, you can't guarantee that your server didn't crash without a successful commit. This potentially leaves a charged card without recording the purchase; the webhook will sync it up afterwards.

Distributed systems are hard if you think past the happy path. I wish every REST API would have an equivalent of Idempotency-Key.

IgorPartola · on Aug 4, 2017

Yup. I'm familiar with Strip's solution to this and am really happy with it. I also wish more API's had that.

But this brings me back to my original point: having a library that blindly re-tries POST requests that result database transactions on every conflict is not going to work. Until every external API, from printers to credit card processors, has an Idempotency Key equivalent you just can't do that generically, and you will have to do that with a custom bit of code for every situation.

Isn't it easier to just acquire an advisory lock for the critical bit and take your own database's concurrency contention out of the equation?

stickfigure · on Aug 8, 2017

A lock still doesn't give you a transaction across two systems that aren't transactionally connected. If you're specifically worried about retries you can create a "task" (row in a table) in the transaction and have a consumer read tasks and propagate the message.

A lock is seductive because in the happy path it looks like you have a distributed transaction. In practice making reliable calls across distributed systems involves more complexity. The hard part is figuring out what to do when your remote call times out and you aren't sure if it succeeded; once you resolve that, you can usually work with any concurrency scenario.

wbl · on Aug 3, 2017

That's how isolation is implemented. MVCC is not compatible with any standard SQL level, but it is used for the lower ones.

lomnakkus · on Aug 2, 2017

> So use names/advisory locks: create a named lock called "order-1234" where 1234 is tbe ID of the order.

If all your locking is single-transaction, then I think a better technique might actually to have a separate table where you have the individual lock IDs and use SELECT FOR UPDATE on the appropriate row there[1] -- only to obtain the lock. (Of course, most of the caveats you've spelled out in this thread still apply wrt. ordering, etc.)

Reasoning: Most named lock/advisory lock implementations don't participate in the current transaction, so you can end with "stuck" locks where the lock was acquired, but never released because the transaction before the "release" action failed. It goes without saying that this is not a problem that can be solved client-side, so you end up having to have "lock cleanup" (and how can you be sure that cleaning up a lock is actually the right thing, etc.).

[1] You might also have to UPSERT rows depending on how "dynamic" your lock IDs are, etc.

ivanhoe · on Aug 2, 2017

Reading this article made it much clearer to me why both locks and transactions are needed: https://www.brightbox.com/blog/2013/10/31/on-mysql-locks/

IgorPartola · on Aug 2, 2017

You absolutely need both. They serve different purposes because transactions provide ACID and advisory locks can be used to provide higher level ordering of application code.

daigoba66 · on Aug 2, 2017

For those using MSSQL: https://docs.microsoft.com/en-us/sql/relational-databases/sy...

aidos · on Aug 2, 2017

Glad to hear you've had success with that technique as I've been thinking about trying it myself.

My plan is to use the WITH FOR UPDATE technique because I figure if I always lock the one top level object in my tree it allows me to isolate that whole subsection of the tree (so long as I'm consistent with the locking). Basically it should give a way to serialize operations on subsections of the data (which makes a lot of sense in general usage where it's partitioned naturally).

IgorPartola · on Aug 2, 2017

I've done this before and it's a valid technique. It still has some edge cases that can bite you. First, you still have to make sure that you do all your reads and updates in a consistent order because otherwise you will deadlock. Second, and more important, you must use repeatable reads for this technique. Otherwise given my example of orders and order items, you can get really funny behavior. Say you have a function on class Order that recalculates totals with tax, shipping, etc. It's going to be based on the rows in the order items table. So you do your read of order row with FOR UPDATE, then read a list of items and do some calculations and update things. Then in the same transaction you read the order items again, except the user has now added another item to the cart. If adding items to the cart didn't lock the order with FOR UPDATE, that read would have gone through.

Also, locks using FOR UPDATE don't have explicit lock mechanics. You can't check if that row is locked. You can't easily control the timeout. I consider it a bandaid technique. Bandaids are great for small boo boos, but they don't work if you already have a codebase that is complex and does things in an inconsistent order.

aidos · on Aug 2, 2017

Thanks for the extra detail; it's always good to hear of any gotchas. In my system a lot of what happens is effectively a single user operates on a subset of data that nobody else touches but I'll definitely to a named lock prototype before I go down any particular path.

ellius · on Aug 2, 2017

This was an awesome comment. I've been trying to figure this out but haven't known what I was looking for. Thanks for the advice.

IgorPartola · on Aug 2, 2017

Just so you know, this isn't the only way to do this. You can use external systems for locks as well. Redis is a good candidate: http://loose-bits.com/2010/10/distributed-task-locking-in-ce...

skyde · on Aug 2, 2017

I woulds strongly advise against using Redis for lock. Maybe something with a real consensus algorithm like ETCD or Zookeeper but otherwise it's better to keep the lock inside the RDBMS.

virtuabhi · on Aug 2, 2017

What is the granularity of the named locks? Seems like table level.

IgorPartola · on Aug 2, 2017

The granularity is literally the name of the lock. It's not attached to anything else in the DB. You get to define what the lock means in your code. If you use Django/Python + Postgres, here's an example of a nice abstraction on top of it: https://pypi.python.org/pypi/django-pglocks/1.0.1

Think of these locks as mutexes in threaded code: they don't mean anything on their own, but they simply guard against the rest of your code. Contrived semi-pseudocode example:

    def add_item_to_cart(order, item):
        try:
            lock = acquire_named_lock('order-{}'.format(order.id), timeout=10)
            order.add_item(item)
            order.recalculate_total()
            order.save()
            lock.release()
        except LockTimeoutError:
            raise OrderUpdateError('Timed out trying to add item to order.')


    def checkout(order):
        try:
            lock = acquire_named_lock('order-{}'.format(order.id), timeout=10)
            order.pay()
            order.generate_and_email_invoice()
            order.print_shipping_label()
            order.save()
            lock.release()
        except LockTimeoutError:
            raise OrderUpdateError('Timed out trying to check out.')

Note that the two functions will never step on each others' toes because the first thing they do is acquire a lock.

virtuabhi · on Aug 2, 2017

Thanks, got it. It is nice to see database systems directly offering a locking system for use in applications.

SoMisanthrope · on Aug 2, 2017

Does that give you a row-level lock that extends through the chain of referential integrity, for the Order?

IgorPartola · on Aug 2, 2017

No. These advisory locks don't know anything about what you are locking. You name and you use the lock, not the RDBMS. That is you could give the lock the name `my-silly-lock-name` and that would totally work. As long as two bits of your code use the same lock name, they'll be executed in series. Hell, the code inside the code guarded by these locks doesn't even have to touch the DB. This is like a separate service your RDBMS provides that doesn't touch any tables, sequences, etc.

Essentially these are mutexes identified by strings that are managed by your RDBMS. That's all.

contingencies · on Aug 2, 2017

Weird to see hand-drawn sideways semi message sequence charts. (Looks vaguely reminiscent of musical notation!)

For those yet to discover it, a great tool is http://www.mcternan.me.uk/mscgen/ ... obligatory JS equivalent https://mscgen.js.org/

barrkel · on Aug 2, 2017

This article mostly addresses transaction isolation from the correctness point of view, rather than the tradeoff between semantics and performance.

Higher isolation levels add read locks that can block concurrent transactions. That's the biggest practical reason everyone doesn't use serializable, but this practical guide doesn't really address it. There's not much discussion of the gotchas whereby you can accidentally break concurrency in an application.

For example there are somewhat surprising sources of contention, e.g. adding rows with foreign keys adds locks to the rows being referenced in foreign table, to stop them getting deleted. Or consider gap locks in queries with range predicates.

IMO the guide is mostly theoretical / introductory rather than practical. It's an OK starting point for someone who knows almost nothing about databases that are being used as shared state for a concurrent application.

anarazel · on Aug 2, 2017

> Higher isolation levels add read locks that can block concurrent transactions.

Not in postgres. REPEATABLE READ doesn't add any locks, and SERIALIZABLE adds locks, but they're optimistic. So blocking isn't something you're going to see - the relevant thing to measure is going to be the rate of transactions being rolled back due to serialization failures.

barrkel · on Aug 2, 2017

Sure, but the article isn't about postgres.

anarazel · on Aug 2, 2017

> Sure, but the article isn't about postgres.

Indeed. But your point was:

> Higher isolation levels add read locks that can block concurrent transactions.

Which isn't generally true, given the example of postgres.

needusername · on Aug 2, 2017

> Higher isolation levels add read locks that can block concurrent transactions.

Not in Oracle they don't. Reads in Oracle never block. You could argue that what Oracle calls serializable is in fact only snapshot isolation.

blaisio · on Aug 2, 2017

This is a great article. It explains the various concurrency issues in fairly plain English, which is great! I wish more people were aware of these isolation levels and their consequences. Many application developers will either a) assume the database will always figure it out for them (in other words, they assume it is always serializable, with no errors) or b) hold great fear of the database and it's mystical use of locks - this type of developer tends to avoid running queries at all cost, lest they wake the beast!

nthcolumn · on Aug 2, 2017

I was an Informix DBA years ago and the other day an old friend who now works with a company supporting game developers told me that Informix is (still) really popular principally due to the way it handles locks and multi-concurrency. Surprised to say the least.

groue · on Aug 2, 2017

I'd also like to thank the other for his attempt at raising the general awareness of transaction isolation. Even when you deal with simpler databases like SQLite, isolation is a serious topic. When serializing accesses hinders performances, dealing with the "WAL mode" that grants SQLite with single-writer-multipler-readers abilities is everything but trivial, because of the huge amount of documentation that has to be read in order to reach the desired level of isolation. I've written an SQLite library in Swift that provides by default the snapshot isolation level which grants most developers with both peace of mind, and non-blocking reads. Check out its "concurrency guide" if you're interested: https://github.com/groue/GRDB.swift/blob/master/README.md#co...

groue · on Aug 2, 2017

Unique to GRDB.swift, as far as I know, is the ability to perform database writes and then wait until snapshot isolation has been reached in another connection before releasing the writing lock. This allows to read from a known database state, while allowing other writes to be performed. The ability to read for a precisely known database state allows nice features like database observation and reactive patterns, while taking advantage from the WAL mode by not holding locks longer than necessary.

marklgr · on Aug 2, 2017

I'm not sure I get the Dirty Writes example about not always being able to rollback: it first says going back to state A would lose w2[x], so it stays in state C. Then it says that aborting t2 can't go back to states B or C, concluding that the scenario thus can't be rollbacked. But what about going back to state A in case of a1+a2?

sciurus · on Aug 2, 2017

If you enjoy this post, you'll love Martin Kleppmann's book Designing Data-Intensive Applications. Chapter 7 covers exactly this topic in more depth.

http://dataintensive.net/

https://www.safaribooksonline.com/library/view/designing-dat...

Twisell · on Aug 2, 2017

Very interesting although I'm not sure about the last point where the author assert that transaction errors can't be managed inside a pl/pgsql function.

It's a little bit twisted but I think you could write a pl/pgsql function that act as a client of it's own database using fdw or dblink.

This is however pretty twisted and might have side effect but I'm pretty curious about that.

Technically the author is still damn right because this will be using a client library at some point as he explain.

DDerTyp · on Aug 2, 2017

Off topic: I LOVE your theme / CSS. Whish there would be a WordPress theme like that! :)