Chapter 7: Transactions
Overview
Transactions are the primary mechanism for handling errors and concurrency in databases. The central concept is ACID (Atomicity, Consistency, Isolation, Durability), but the chapter reveals that these properties are much more nuanced than their acronym suggests—particularly “Isolation,” which has many degrees and is rarely fully implemented in practice. The chapter examines isolation levels from weakest to strongest, and the concurrency bugs each level prevents or allows.
Core challenge: Providing transactions across concurrent processes with minimal performance overhead while preventing data corruption.
Key Concepts
ACID Properties
Atomicity:
- All writes in a transaction succeed or none of them do (all-or-nothing)
- On failure, transaction is aborted and rolled back
- Not about concurrency (despite what the word “atomic” might suggest in concurrent programming)
- Key word: abortability
Consistency:
- Confusingly named—this refers to application invariants (e.g., account balances must sum to zero)
- Not really a database property; it’s the application’s responsibility
- The database can’t enforce arbitrary application logic
- C in ACID doesn’t mean much on its own
Isolation:
- Concurrently executing transactions are isolated from each other
- Ideally, “as if” transactions ran serially one at a time
- Serializability is the strongest isolation level
- In practice, weaker levels used for performance
Durability:
- Once committed, data is persisted (survives crashes, power failures)
- Typically means: data written to non-volatile storage (disk, SSD)
- In distributed systems: data replicated to multiple nodes
- “Perfect durability does not exist” — disk failures, bugs can still corrupt
Weak Isolation Levels
Read Committed (minimum in most databases):
What it prevents:
- Dirty reads: Reading data written by a transaction that hasn’t committed yet
- Dirty writes: Overwriting data written by a transaction that hasn’t committed yet
What it doesn’t prevent: Read skew, non-repeatable reads, phantom reads, write skew, lost updates
Implementation:
- No dirty reads: DB shows only committed data (remembers old value until transaction commits)
- No dirty writes: Row-level locks (hold until commit); only one writer at a time per row
Snapshot Isolation (Repeatable Read):
What it prevents:
- Read skew (non-repeatable reads): Reading different values for same row in same transaction
How it works:
- Each transaction sees a consistent snapshot of the database from its start time
- Implemented via MVCC (Multi-Version Concurrency Control)
- Multiple versions of each row maintained; each transaction reads the version from its start time
- Readers don’t block writers; writers don’t block readers
MVCC mechanics:
- Each row has
created_byanddeleted_bytransaction IDs - Read request: only see rows where
created_by ≤ current_txn_idanddeleted_byis future or null - Writes create new versions; deletes mark rows as deleted
Naming confusion: “Snapshot isolation” often called “Repeatable Read” in PostgreSQL and MySQL (confusingly, different DBs use same name for different levels)
Preventing Lost Updates
Lost update: Two transactions read same value, both modify it, one write overwrites the other
- Example: Counter increment:
read(v) → modify(v+1) → write(v+1)— if two concurrent threads both read v=5, both write 6 (should be 7)
Solutions:
-
Atomic write operations (best when available):
UPDATE counters SET value = value + 1 WHERE key = X- Single operation; no read-modify-write race
- Database serializes concurrent atomic writes
-
Explicit locking (
SELECT FOR UPDATE):SELECT ... FOR UPDATEacquires exclusive lock- Other transactions wait until lock released
- Allows application-level check before update
-
Compare-and-set (CAS):
UPDATE ... WHERE value = old_value AND SET value = new_value- Fails if value changed since read; application retries
- Works if DB doesn’t read from snapshot during CAS check
-
Automatic lost update detection:
- Snapshot isolation can detect lost updates (PostgreSQL does this)
- Abort one of the conflicting transactions; application retries
Serializable Isolation
Write skew: Both transactions read the same data, each makes write that would be valid alone, but together they violate an invariant.
- Example: Two doctors go off-call simultaneously (each checks “at least 2 doctors on call” → both see 2 → both go off-call → 0 doctors on call)
- Not prevented by snapshot isolation; requires full serializability
Phantom reads: Transaction reads a set of rows matching a condition; another transaction inserts a row matching that condition; first transaction reads again, sees new row.
- Example: Meeting room booking—check no conflict, insert booking. Two concurrent transactions both check and see no conflict, both insert.
Three approaches to serializability:
-
Actual serial execution:
- Execute transactions one at a time, on a single thread
- Possible only if transactions are small and fast (no user interaction mid-transaction)
- VoltDB, Redis, Datomic use this
- Works if: entire dataset fits in RAM, transactions are short stored procedures
-
Two-Phase Locking (2PL):
- Writers block readers; readers block writers (unlike snapshot isolation)
- Phase 1 (Expanding): Acquire locks as needed
- Phase 2 (Shrinking): Release locks (only after commit/abort)
- Shared lock (read): Multiple transactions can hold simultaneously
- Exclusive lock (write): Only one at a time; blocks all others
- Predicate locks: Lock all rows matching a query condition (prevents phantoms)
- Index-range locks: Simplified predicate lock (lock a range in an index)
- Problem: Deadlocks can occur (detect and abort one transaction)
- Performance: Much slower than weak isolation due to lock contention
-
Serializable Snapshot Isolation (SSI):
- Optimistic concurrency control: let transactions run, check for conflicts at commit time
- Based on snapshot isolation + conflict detection
- Tracks reads that were based on a stale snapshot; if committed write invalidates the read → abort
- Outperforms 2PL when contention is low
- Used by: PostgreSQL (since 9.1), CockroachDB, Datomic (as “serializable” level)
- Advantage over 2PL: Readers don’t block writers, writers don’t block readers
Two-Phase Commit (2PC) — Distributed Transactions
Problem: How to atomically commit a transaction across multiple nodes?
2PC protocol:
- Prepare phase: Coordinator sends
prepareto all participants; participants promise to commit if asked - Commit phase: If all respond
yes, coordinator sendscommit; if any respondsno, coordinator sendsabort
The coordinator is a single point of failure:
- If coordinator fails after
preparebut beforecommit: participants are stuck (holding locks, can’t proceed) - In-doubt transactions: Must wait for coordinator to recover
2PC guarantees: If coordinator says commit, all nodes commit; if abort, all abort. Atomic across nodes.
Performance cost: Multiple round trips, participants hold locks during prepare phase
Important Points
- ACID “Consistency” is misleading: It’s the application’s job, not the database’s.
- Snapshot isolation is not serializable: Write skew and phantom reads still possible.
- Most “NoSQL” databases provide weaker guarantees: MongoDB historically had no multi-document transactions.
- Lost updates are extremely common bugs: Every read-modify-write without atomic updates is vulnerable.
- SSI is the future: Optimistic concurrency outperforms 2PL when contention is low (most real workloads).
- 2PC is expensive: Use it only when you truly need atomic distributed transactions.
- Transactions are not limited to relational DBs: PostgreSQL, MongoDB 4.0+, Google Spanner all offer ACID multi-row transactions.
Examples & Case Studies
-
PostgreSQL MVCC
- Multiple row versions with transaction ID ranges
- Readers get consistent snapshot without blocking writers
- VACUUM process reclaims old versions no longer needed
-
VoltDB (Actual Serial Execution)
- In-memory DB with single-threaded execution
- Transactions must be stored procedures (pre-compiled SQL)
- Achieves serializable isolation with excellent performance
-
Google Spanner
- Globally distributed, serializable transactions
- TrueTime API: GPS + atomic clocks for bounded timestamp uncertainty
- 2PC + Paxos for distributed atomic commit
-
CockroachDB (SSI)
- Serializable snapshot isolation
- Conflict detection without 2PL overhead
- Distributed serializable transactions at scale
Questions
- What does each letter of ACID actually mean, and which is most important?
- What is read skew and how does snapshot isolation prevent it?
- How does MVCC enable snapshot isolation without blocking?
- What is write skew and why doesn’t snapshot isolation prevent it?
- When would you use SSI vs 2PL for serializable isolation?
- What are the trade-offs between actual serial execution and 2PL?
- How does two-phase commit work and what are its failure modes?
- What is the difference between a lost update and write skew?
Modern Context (2026)
Multi-document transactions in NoSQL:
- MongoDB 4.0 (2018): added multi-document ACID transactions
- By 2026: MongoDB, DynamoDB, Cassandra all offer some form of multi-row transactions
- The old “NoSQL = no transactions” is obsolete
Distributed transactions maturity:
- CockroachDB, Spanner, YugabyteDB: SSI at global scale (2026 standard)
- Saga pattern: long-running transactions via compensating actions (popular in microservices)
- Two-phase commit largely replaced by SSI-based approaches in modern distributed DBs
PostgreSQL serializable everywhere:
- PostgreSQL SSI (since 9.1) is mature and widely used
SET TRANSACTION ISOLATION LEVEL SERIALIZABLEwithout large performance penalty- Many applications that previously used application-level locking now use SSI
Async sagas in microservices:
- Across microservice boundaries: true ACID is hard (2PC is too expensive)
- Saga pattern: sequence of local transactions + compensating transactions on failure
- Orchestration vs choreography approaches
- Eventual consistency is acceptable for many cross-service operations
Status: Notes complete
Last Updated: 2026-04-13