How our users exploited concurrency and how we fixed it

rlpb · on Jan 11, 2013

Even better, rearrange your database to store immutable facts instead of mutable state.

In this example, you might have a goal_completion table to which you append a row when the player completes a goal, instead of changing an existing row. The table could have suitable unique constraints to make sure that each player can only complete a goal once.

This way, it is much harder for your data to accidentally be modified in a way that it becomes wrong.

This also gives you a way to store facts about the completion for free, such as a timestamp for each goal completion. And having data like timestamps is really useful for auditing, debugging and testing.

AdamTReineke · on Jan 11, 2013

I've heard you sometimes shouldn't actually delete data from a database and should use an "IsDeleted" flag instead, particularly with customer orders and such.

What are the proper circumstances to use immutable facts? Anytime you need an undo or audit features?

CaveTech · on Jan 11, 2013

The isDeleted flag is good because it gives the appearance that the data is gone, without removing the context of the data itself.

Immutable facts are great if you want to:

1) Avoid a large amount of updates/race conditions on your tables. Immutable facts, being immutable, can be done using purely inserts.

2) Want to keep a trail or context to the data you're storing. This has a benefit of keeping a log that you can later use to trace issues, behaviours or generate statistics. If you're keeping a state in a database column, without external logs you aren't able to check when or why that state changes. Immutable facts keep track of this for you, but require quite a bit more overhead.

jiggy2011 · on Jan 11, 2013

I've always hated "isDeleted" flags because it adds something that you have to remember to every query you write.

I would almost prefer some way to "ghost" a row where without a specific switch the DBMS will never return it from a query.

Argorak · on Jan 12, 2013

With a view and a set of rules, you can do such things in Postgres, without your application noticing:

http://www.postgresql.org/docs/9.2/static/rules-update.html

SanjayUttam · on Jan 11, 2013

I might be trivializing this but you could create a view "ActiveUsers" / "DeletedUsers". Not saying this make the IsDeleted flag perfect, but it would alleviate that particular problem, IMO.

wulczer · on Jan 11, 2013

I believe this won't work on PostgreSQL, which will always say one row has been updated, even if the new value is the same as the old value.

There are numerous ways around that, from less fancy to really fancy.

  1. use SELECT FOR UPDATE, which will lock the row (I'd say that's the normal way)

  session1> BEGIN;
  session1> SELECT * FROM goals WHERE player_id = ? FOR UPDATE
  session2> BEGIN;
  session2> SELECT * FROM goals WHERE player_id = ? FOR UPDATE
  # session2 is now hanging
  session1> UPDATE goals SET completed = true WHERE goal_id = ?
  session1> UPDATE players SET points = points + 1 WHERE player_id = ?
  session1> COMMIT;
  # session2 now proceeds, sees that the goal has been completed, forfeits awarding the reward

  2. use suppress_reduntant_updates_trigger (fun)

  session> CREATE TRIGGER suppress_goal_t BEFORE UPDATE ON goals FOR EACH ROW EXECUTE PROCEDURE suppress_redundant_updates_trigger();
  session> UPDATE goals SET completed = true WHERE goal_id = ?
  UPDATE 1
  session> UPDATE goals SET completed = true WHERE goal_id = ?
  UPDATE 0
  # now you can use the approach mentioned in the article

  3. use true serialisability
  session1> BEGIN;
  session1> SET transaction_isolation TO serializable;
  session1> SELECT * FROM goals WHERE player_id = ?
  session2> BEGIN;
  session2> SET transaction_isolation TO serializable;
  session2> SELECT * FROM goals WHERE player_id = ?
  session1> UPDATE goals SET completed = true WHERE goal_id = ?
  session1> UPDATE players SET points = points + 1 WHERE player_id = ?
  session1> COMMIT;
  session2> UPDATE goals SET completed = true WHERE goal_id = ?
  ERROR:  could not serialize access due to concurrent update
  # session2 now has to rollback the transaction

There's a few more, but I ran out of steam typing ;) Yay, Postgres!

EvilTrout · on Jan 11, 2013

facepalm

I forgot a very important part of the query:

row_count = Goal.update_all "completed = true", ["player_id = ? AND completed = false", player.id]

If you update where completed = false, the rowcount will only be 1 when the update works.

(I've updated the blog post.)

vemv · on Jan 11, 2013

Really not that much of a subtle bug I'm afraid, the pattern of "mutating the condition itself on success" is error-prone in single-threaded code too.

A good example being asynchronous javascript, where you'd do this to prevent issues like the exposed in the article:

    var some_mutable_condition = false;
    var requested = false;

    if (!some_mutable_condition && !requested){
        requested = true; // ensure one request is performed at most
        $.ajax(/*async code that'll set some_mutable_condition to true*/);
    }

mguterl · on Jan 11, 2013

Underscore.js has a function called once for this exact purpose.

vemv · on Jan 11, 2013

Good to know, thank you.

mcherm · on Jan 11, 2013

I like collecting tales like this one. They are useful for scaring new programmers who think they understand how to safely do multi-thread programming.

emeraldd · on Jan 11, 2013

If I remember correctly, one of the ways around this is to do a no-op update before querying. So you'd update a field in the goal table to it's original value, which would cause the database to lock the table for the duration of the transaction.

Also, if I remember correctly, this is one of the things that can be solved with an MVCC aware engine.

http://en.wikipedia.org/wiki/Multiversion_concurrency_contro...

With MVCC a transaction should be aborted if another thread modifies the underlying data.

amikazmi · on Jan 11, 2013

No need for a workaround, you have Optimistic Locking (via lock_version field)

It raises an exception in case you try to update stale data.

http://api.rubyonrails.org/classes/ActiveRecord/Locking/Opti...

amichal · on Jan 11, 2013

Or no table changes with http://api.rubyonrails.org/classes/ActiveRecord/Locking/Pess...

but in this specific case `affected_rows` works better as long as you know its reliable.

EvilTrout · on Jan 11, 2013

Right, but that involves changing your database and adding hidden form fields to your app in order for it to work on a multi process environment (like every production server should be.)

My solution is basically the same amount of lines, requires no such changes.

gnaritas · on Jan 12, 2013

Yes, but optimistic locking with a version is the general and known solution to this problem in all cases and is built into the framework already. Your solution works in just one case, not the general case.

jessaustin · on Jan 11, 2013

Another solution would be to have e.g. a "Completions" table indexed on user and on date. Then the RDBMS wouldn't allow duplicate goal completions for the same user and day. If you wanted to allow up to X completions/day, just add a new column to track that, and limit that column to the range [0, X). I often find that updating rows in response to transactions is a code smell. Each transaction should write one or more rows to one or more tables. Of course, if you have denormalized summary tables, they might have to update, but you only do that when you know you have to.

aufreak3 · on Jan 11, 2013

Umm ... and the 'Player.transaction' doesnt seem to be one then if it allows two transactions to overlap?

EvilTrout · on Jan 11, 2013

There's a difference between a transaction that locks for writes and one that locks for reads. By default the reading is not locked in this (common) case.

Strilanc · on Jan 11, 2013

That's a terrible default! Ignoring read-write races is a surefire way to... well, create bugs like the one in the post.

emeraldd · on Jan 11, 2013

The common case is going to be many readers with a few or even a single writer. In that scenario, a read locked database would destroy any chance of operating at scale.

marcosdumay · on Jan 11, 2013

Many DBMS have a SELECT FOR UPDATE statement just for those cases you want to lock everybody out while just reading.

Lozzer · on Jan 11, 2013

For a lot of RDBMS, transaction semantics can be later with different transaction isolation levels. There levels like repeatable read which work in this way, but the default level is usually weaker.

voidlogic · on Jan 11, 2013

"I was worried that I’d have to resort to semaphores or some kind of other exotic programming construct."

I'm not saying a semaphore would have been appropriate here but- For real? A semaphore is exotic? It is an one of the oldest original synchronization methods. Semaphore were developed by Dijkstra in 1965. The author's C-S program failed him.

EvilTrout · on Jan 11, 2013

Everyone's got to start somewhere!

In the sentence before I explicitly said I was a newbie to this stuff at the time. Obviously it's not exotic to a more experienced developer :)

voidlogic · on Jan 11, 2013

My point was that a semaphore is freshmen or sophomore level undergrad stuff, something I would expect every professional developer to understand. Here someone who considers themselves a developer, which I assume means a professional software developer, considers it an exotic construct. Things like this shakes my faith in the computer science educational system.

EvilTrout · on Jan 11, 2013

I think you're being a bit dramatic about little part of one sentence.

There are many application developers out there using Rails, PHP or Django who can go years without invoking locks or semaphores due to the strengths of the APIs and frameworks they use.

As they learn and grow as developers, they will be introduced to new problems and solutions. A semaphore can be exotic, not because they've never heard of it before, but because they've not used them very much outside of that one assignment they did in CS 5 years ago.

How about instead of criticizing people's eductions, we encourage them to be mindful about what they don't know, and to learn whatever they can when they get a chance?

voidlogic · on Jan 11, 2013

"How about instead of criticizing people's eductions, we encourage them to be mindful about what they don't know, and to learn whatever they can when they get a chance?" I'm all for people learning things they were not taught in school and I commend you for that.

But as the field grows there will be more and more material that was traditionally taught, but is now dropped. For example, I could imagine a misguided department dropping a once required OS/Systems class to allow students to choose that as an elective, or maybe a node.js or Ruby elective instead. I think those kind of program changes are serious and need discussion-

brodney · on Jan 11, 2013

Not everyone who does this was in a CS Program. When I learned about them, I went with a lock-free model for my own project because semaphores seemed to add more complexity than they solved. I can understand the author's desire not to use them.

Tyr42 · on Jan 11, 2013

I'm a CS student at a university known for it's CS program. I know about semaphores, but this has not been covered in any of the first or second year classes.

voidlogic · on Jan 11, 2013

I'm sure you will see it before you graduate, whenever you take your basic operating systems course.

dtwwtd · on Jan 11, 2013

In our program at Michigan, concurrency isn't really discussed until your 3rd or 4th year when/if you take the operating systems course.

kaffeinecoma · on Jan 11, 2013

Use Optimistic Locking:

http://railscasts.com/episodes/59-optimistic-locking-revised...

jamieb · on Jan 11, 2013

Best to discover the joys of concurrency while writing a game than a bank. =) Welcome! Our playground is a lot of fun!

EGreg · on Jan 11, 2013

Basically, the correct way is to use transactions. Use a transactional database because it supports MVCC.

EvilTrout · on Jan 11, 2013

In the example code I gave, by default in Rails with Postgres (which supports MVCC) you'll still get the error. So it's not as simple as using a transactional database: you either use a trick like this one, an row-based version column, or you need to change your isolation levels which has other performance issues.

delano · on Jan 11, 2013

You could wrap the transaction in a lock based on the player ID which you could do with Redis or Memcached with near-zero impact on performance.

A good reference: http://www.nateware.com/an-atomic-rant.html

EvilTrout · on Jan 11, 2013

That introduces external dependencies though. Also, since the update has to occur anyway, I'd bet that my solution is faster than also making a call to another database for a lock.