Monday, May 31, 2010

Transaction isolation level

Recently I have read very good explanation of transaction isolation levels in "Real World Java EE Patterns. Rethinking Best Practices." by Adam Bien.

Transactions were introduced for two reasons. To make complex operations atomic (all or nothing) and to give possibility to concurrently use resources. Without transactions it is possible that two users (or more specifically actors, this can be some parts of the system, not necessarly live users) read/write the same resource simultaneously, and changes made by one of them are lost and overwriten by the other one.

Consider following example document:

Cheesecake bottom
250g crunchy bisquits
100g butter

Cheesecake top
900g cottage cheese
250g mascarpone cheese
1 lemon

Imagine one user gets the document and the other user also is getting the same document. Both users have the same data. Now both of them are editing it. One likes sweeter cake bottom, so adds "2 spoons of sugar" to the recipe, and writes changed recipe back to the database. In database now it looks like this:

Cheesecake bottom
250g crunchy bisquits
100g butter
2 spoons of sugar

Cheesecake top
900g cottage cheese
250g mascarpone cheese
1 lemon

Meanwhile second user adds "2 eggs" to cheescake top recipe and saves his version:

Cheesecake bottom
250g crunchy bisquits
100g butter
2 spoons of sugar

Cheesecake top
900g cottage cheese
250g mascarpone cheese
1 lemon
2 eggs
Change made by the first user is lost. In some cases it can lead to inconsistent data.

So that's why transactions were introduced. Transaction isolates changes made by one user from changes made by the other. There are 4 levels of transaction isolation.

Serializable

Transaction locks all necessary resources. If user wants to read or write some resource, it is locked and no one else can read or write it. This level of isolation can easly lead to deadlocks:
  1. User Ann opens transaction. Ann needs resource A, transaction locks it.
  2. User John opens another transaction. John needs resource B, transaction locks it.
  3. Ann needs resource B, but it is locked, so Ann's transaction needs to wait until it is released
  4. John needs resource A, but it is locked. John's transaction waits until it is released. Both users wait endlessly.

If John didn't need A, he finishes his transaction, B is released and Ann can finish her transaction too.

Repeatable reads

Guarantees that the same query will return the same results if executed in one transaction. Even if other transaction modifies resource meanwhile. Exception is adding - new rows can appear in query result. If another transaction deletes existing rows or modifies them, this changes are not visible.
  1. Ann opens transaction. She reads names of java4people organizers: "Stawicki" and "Gruchała".
  2. Bob opens transaction and deletes "Stawicki". Bob commits changes and closes transaction.
  3. Again Ann reads names. She gets "Stawicki" and "Gruchała".

If Bob added some name, Ann would read it too.

Read commited

The same query can return different result even in one transaction if another transaction makes changes and commits them.
  1. Ann opens transaction and reads names of java4people organizers: "Stawicki" and "Gruchała".
  2. Bob opens transaction and deletes "Stawicki".
  3. Ann reads names again. She gets "Stawicki" and "Gruchała".
  4. Bob commits his transaction
  5. Ann reads names again (still in one transaction). She gets "Gruchała".

Read uncommited

Like no transactions at all. It only gives us atomic operations. User can rollback all the changes in transaction. Changes are visible to other transactions even before commit. Of course, if transaction that made changes is rolled back, changes are not visible any more.

4 comments:

Mateusz Mrozewski said...

While talking about isolation levels it's good to mention also the risks that they cause. This short DB2 summary presents a table of undesirable effects.

Another thing to consider is how it affects the performance.

It is also good to mention that deadlocks are not just the problem of serializable isolation level.

And in Repeatable read section I think you ment "If Bob added some name, Ann wouldn't read it too." (if she is still in the same transaction).

Anyway, excellent, short and thoughtful summary of isolation levels. It's a difficult subject. Maybe you will write more on this :)

pawelstawicki said...

"Ann would read it too" - this is correct. In repeatable read adding is exception. Modification and deletion is not visible for another transactions, but adding is.

Leszek Gruchała said...

Thank's man for making me famous again :-)

I had the same doubts as Mateusz on Repeatable read part, but after third read I understood it well.

Just kidding, I really like such posts. Concrete useful information in a very readable way. Want more!

Amine said...

Nice post. I like the clarity of your examples. Hold on like that.