Chapter 7: Transactions

Overview

Transactions are the primary mechanism for handling errors and concurrency in databases. The central concept is ACID (Atomicity, Consistency, Isolation, Durability), but the chapter reveals that these properties are much more nuanced than their acronym suggests—particularly “Isolation,” which has many degrees and is rarely fully implemented in practice. The chapter examines isolation levels from weakest to strongest, and the concurrency bugs each level prevents or allows.

Core challenge: Providing transactions across concurrent processes with minimal performance overhead while preventing data corruption.

Key Concepts

ACID Properties

Atomicity:

All writes in a transaction succeed or none of them do (all-or-nothing)
On failure, transaction is aborted and rolled back
Not about concurrency (despite what the word “atomic” might suggest in concurrent programming)
Key word: abortability

Consistency:

Confusingly named—this refers to application invariants (e.g., account balances must sum to zero)
Not really a database property; it’s the application’s responsibility
The database can’t enforce arbitrary application logic
C in ACID doesn’t mean much on its own

Isolation:

Concurrently executing transactions are isolated from each other
Ideally, “as if” transactions ran serially one at a time
Serializability is the strongest isolation level
In practice, weaker levels used for performance

Durability:

Once committed, data is persisted (survives crashes, power failures)
Typically means: data written to non-volatile storage (disk, SSD)
In distributed systems: data replicated to multiple nodes
“Perfect durability does not exist” — disk failures, bugs can still corrupt

Weak Isolation Levels

Read Committed (minimum in most databases):

What it prevents:

Dirty reads: Reading data written by a transaction that hasn’t committed yet
Dirty writes: Overwriting data written by a transaction that hasn’t committed yet

What it doesn’t prevent: Read skew, non-repeatable reads, phantom reads, write skew, lost updates

Implementation:

No dirty reads: DB shows only committed data (remembers old value until transaction commits)
No dirty writes: Row-level locks (hold until commit); only one writer at a time per row

Snapshot Isolation (Repeatable Read):

What it prevents:

Read skew (non-repeatable reads): Reading different values for same row in same transaction

How it works:

Each transaction sees a consistent snapshot of the database from its start time
Implemented via MVCC (Multi-Version Concurrency Control)
Multiple versions of each row maintained; each transaction reads the version from its start time
Readers don’t block writers; writers don’t block readers

MVCC mechanics:

Each row has created_by and deleted_by transaction IDs
Read request: only see rows where created_by ≤ current_txn_id and deleted_by is future or null
Writes create new versions; deletes mark rows as deleted

Naming confusion: “Snapshot isolation” often called “Repeatable Read” in PostgreSQL and MySQL (confusingly, different DBs use same name for different levels)

Preventing Lost Updates

Lost update: Two transactions read same value, both modify it, one write overwrites the other

Example: Counter increment: read(v) → modify(v+1) → write(v+1) — if two concurrent threads both read v=5, both write 6 (should be 7)

Solutions:

Atomic write operations (best when available):
- UPDATE counters SET value = value + 1 WHERE key = X
- Single operation; no read-modify-write race
- Database serializes concurrent atomic writes
Explicit locking (SELECT FOR UPDATE):
- SELECT ... FOR UPDATE acquires exclusive lock
- Other transactions wait until lock released
- Allows application-level check before update
Compare-and-set (CAS):
- UPDATE ... WHERE value = old_value AND SET value = new_value
- Fails if value changed since read; application retries
- Works if DB doesn’t read from snapshot during CAS check
Automatic lost update detection:
- Snapshot isolation can detect lost updates (PostgreSQL does this)
- Abort one of the conflicting transactions; application retries

Serializable Isolation

Write skew: Both transactions read the same data, each makes write that would be valid alone, but together they violate an invariant.

Example: Two doctors go off-call simultaneously (each checks “at least 2 doctors on call” → both see 2 → both go off-call → 0 doctors on call)
Not prevented by snapshot isolation; requires full serializability

Phantom reads: Transaction reads a set of rows matching a condition; another transaction inserts a row matching that condition; first transaction reads again, sees new row.

Example: Meeting room booking—check no conflict, insert booking. Two concurrent transactions both check and see no conflict, both insert.

Three approaches to serializability:

Actual serial execution:
- Execute transactions one at a time, on a single thread
- Possible only if transactions are small and fast (no user interaction mid-transaction)
- VoltDB, Redis, Datomic use this
- Works if: entire dataset fits in RAM, transactions are short stored procedures
Two-Phase Locking (2PL):
- Writers block readers; readers block writers (unlike snapshot isolation)
- Phase 1 (Expanding): Acquire locks as needed
- Phase 2 (Shrinking): Release locks (only after commit/abort)
- Shared lock (read): Multiple transactions can hold simultaneously
- Exclusive lock (write): Only one at a time; blocks all others
- Predicate locks: Lock all rows matching a query condition (prevents phantoms)
- Index-range locks: Simplified predicate lock (lock a range in an index)
- Problem: Deadlocks can occur (detect and abort one transaction)
- Performance: Much slower than weak isolation due to lock contention
Serializable Snapshot Isolation (SSI):
- Optimistic concurrency control: let transactions run, check for conflicts at commit time
- Based on snapshot isolation + conflict detection
- Tracks reads that were based on a stale snapshot; if committed write invalidates the read → abort
- Outperforms 2PL when contention is low
- Used by: PostgreSQL (since 9.1), CockroachDB, Datomic (as “serializable” level)
- Advantage over 2PL: Readers don’t block writers, writers don’t block readers

Two-Phase Commit (2PC) — Distributed Transactions

Problem: How to atomically commit a transaction across multiple nodes?

2PC protocol:

Prepare phase: Coordinator sends prepare to all participants; participants promise to commit if asked
Commit phase: If all respond yes, coordinator sends commit; if any responds no, coordinator sends abort

The coordinator is a single point of failure:

If coordinator fails after prepare but before commit: participants are stuck (holding locks, can’t proceed)
In-doubt transactions: Must wait for coordinator to recover

2PC guarantees: If coordinator says commit, all nodes commit; if abort, all abort. Atomic across nodes.

Performance cost: Multiple round trips, participants hold locks during prepare phase

Important Points

ACID “Consistency” is misleading: It’s the application’s job, not the database’s.
Snapshot isolation is not serializable: Write skew and phantom reads still possible.
Most “NoSQL” databases provide weaker guarantees: MongoDB historically had no multi-document transactions.
Lost updates are extremely common bugs: Every read-modify-write without atomic updates is vulnerable.
SSI is the future: Optimistic concurrency outperforms 2PL when contention is low (most real workloads).
2PC is expensive: Use it only when you truly need atomic distributed transactions.
Transactions are not limited to relational DBs: PostgreSQL, MongoDB 4.0+, Google Spanner all offer ACID multi-row transactions.

Examples & Case Studies

PostgreSQL MVCC
- Multiple row versions with transaction ID ranges
- Readers get consistent snapshot without blocking writers
- VACUUM process reclaims old versions no longer needed
VoltDB (Actual Serial Execution)
- In-memory DB with single-threaded execution
- Transactions must be stored procedures (pre-compiled SQL)
- Achieves serializable isolation with excellent performance
Google Spanner
- Globally distributed, serializable transactions
- TrueTime API: GPS + atomic clocks for bounded timestamp uncertainty
- 2PC + Paxos for distributed atomic commit
CockroachDB (SSI)
- Serializable snapshot isolation
- Conflict detection without 2PL overhead
- Distributed serializable transactions at scale

Questions

What does each letter of ACID actually mean, and which is most important?
What is read skew and how does snapshot isolation prevent it?
How does MVCC enable snapshot isolation without blocking?
What is write skew and why doesn’t snapshot isolation prevent it?
When would you use SSI vs 2PL for serializable isolation?
What are the trade-offs between actual serial execution and 2PL?
How does two-phase commit work and what are its failure modes?
What is the difference between a lost update and write skew?

Modern Context (2026)

Multi-document transactions in NoSQL:

MongoDB 4.0 (2018): added multi-document ACID transactions
By 2026: MongoDB, DynamoDB, Cassandra all offer some form of multi-row transactions
The old “NoSQL = no transactions” is obsolete

Distributed transactions maturity:

CockroachDB, Spanner, YugabyteDB: SSI at global scale (2026 standard)
Saga pattern: long-running transactions via compensating actions (popular in microservices)
Two-phase commit largely replaced by SSI-based approaches in modern distributed DBs

PostgreSQL serializable everywhere:

PostgreSQL SSI (since 9.1) is mature and widely used
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE without large performance penalty
Many applications that previously used application-level locking now use SSI

Async sagas in microservices:

Across microservice boundaries: true ACID is hard (2PC is too expensive)
Saga pattern: sequence of local transactions + compensating transactions on failure
Orchestration vs choreography approaches
Eventual consistency is acceptable for many cross-service operations

Status: Notes complete
Last Updated: 2026-04-13

Study Notes by Niladri & AI

Explorer

ch07-transactions