Chapter 9 Cheat Sheet - Consistency and Consensus
One-Line Summaries
| Concept | One-Liner |
|---|---|
| Linearizability | System behaves as if single-copy; reads always return most recent write |
| Causal consistency | Causally related events seen in order; concurrent events may differ |
| Eventual consistency | Replicas converge eventually; no ordering guarantees during lag |
| CAP theorem | During network partition: choose consistency OR availability, not both |
| PACELC | CAP + latency trade-off even without partitions |
| Total order broadcast | All nodes see all messages in same order; equivalent to consensus |
| Consensus | Nodes agree on one value; once decided, cannot change |
| Raft | Leader-based consensus algorithm; designed for understandability |
| Paxos | Original consensus algorithm; complex but proven correct |
| ZooKeeper/etcd | Distributed coordination services built on consensus |
| 2PC | Two-Phase Commit — atomic commit across multiple nodes; coordinator is SPOF |
Consistency Model Spectrum
Strongest ──────────────────────────────────────────── Weakest
Linearizable → Strict Serial → Serializable → Snapshot → Causal → Monotonic → Eventual
Linearizable: Reads always see most recent write; global total order
Strict serial: Linearizable + serializable (strongest)
Serializable: Transactions appear serial; correct outcome
Snapshot: Each transaction sees consistent point-in-time view
Causal: Causally related events in order; concurrent can differ
Monotonic: Read your own writes; no time reversal
Eventual: Replicas converge when writes stop; anything goes during lag
Linearizability vs Serializability
Linearizability: Serializability:
───────────────────────────── ──────────────────────────────────────────
Recency guarantee Isolation guarantee for transactions
On single objects On multiple-operation transactions
Real-time ordering required Any serial order acceptable (even past)
Implemented at: storage level Implemented at: transaction isolation level
Used for: locks, elections Used for: concurrent database transactions
Cost: coordination on every op Cost: locking or abort overhead
STRICT SERIALIZABILITY = both properties together (most expensive, most correct)
CAP Theorem Explained
Network partition happens (you can't prevent this):
CP (Consistent + Partition-tolerant):
During partition: REFUSE requests (return error)
After partition heals: respond normally
Examples: ZooKeeper, etcd, HBase
Use when: data correctness > availability
AP (Available + Partition-tolerant):
During partition: RESPOND with possibly stale data
After partition heals: reconcile conflicts
Examples: Cassandra, DynamoDB, CouchDB
Use when: availability > consistency
"You can't opt out of partition tolerance in real distributed systems"
"CAP consistency = linearizability, NOT ACID consistency"
PACELC Diagram
┌─ Partition? ─── Yes ──→ Latency (L) vs Consistency (C)
│ ↗ ↘
Does partition │ Lower latency Stronger consistency
happen? │
└─ No partition ──→ Availability (A) vs Consistency (C)
Real systems labeled as PA/EL or PC/EC:
PA/EL: Cassandra (available during partition; low latency normally)
PC/EC: Spanner (consistent during partition; consistent normally, high latency)
PA/EC: Amazon DynamoDB (available during partition; consistent by default)
How Raft Works
Three node states: FOLLOWER, CANDIDATE, LEADER
Normal operation:
All nodes: FOLLOWER (receiving heartbeats from LEADER)
Client → LEADER (only leader accepts writes)
LEADER → send log entry to all FOLLOWERS (AppendEntries RPC)
FOLLOWERS → send acknowledgment
Once majority ack'd → entry COMMITTED
LEADER → send commit to FOLLOWERS
Leader election (leader timeout):
FOLLOWER timeout (no heartbeat) → becomes CANDIDATE
CANDIDATE → broadcasts RequestVote to all nodes
Each node votes once (first-come-first-served)
If CANDIDATE gets majority votes → becomes new LEADER
LEADER → immediately sends heartbeat to suppress other elections
Log replication:
LEADER has most up-to-date log (guaranteed by election voting)
All writes flow through LEADER
Committed = written to majority of nodes
Total Order Broadcast vs Consensus
Total Order Broadcast: Consensus (single-decree):
────────────────────────── ───────────────────────────────
All nodes agree on message ORDER All nodes agree on ONE VALUE
Delivered reliably (no loss) Irreversible once decided
Sequence of decisions Single decision
They are equivalent:
TOB → Consensus: each message position = one value agreed on
Consensus → TOB: each consensus round decides next message in order
Why it matters:
Consensus = leader election = atomic commit = total order broadcast
Solve one, you solve them all
ZooKeeper / etcd Primitives
ZooKeeper hierarchy:
/
├── /leaders/
│ └── /leaders/cluster1 (ephemeral, sequential)
└── /locks/
└── /locks/resource_A (ephemeral)
Ephemeral node: deleted when client session ends
→ Use for: locks, leader presence
Sequential node: auto-numbered (resource-0001, resource-0002)
→ Use for: fencing tokens (ever-increasing), queue ordering
Watch: callback when node changes/created/deleted
→ Use for: detecting leader changes, config updates
Lock implementation:
1. CREATE /locks/my-lock EPHEMERAL
2. If success → acquired
3. If fail → WATCH the existing node
4. When watch fires (node deleted) → retry step 1
Key Trade-offs
| Decision | Pro | Con | When to Use |
|---|---|---|---|
| Linearizable | Correct, no stale reads | High latency, CP (blocks on partition) | Locks, unique constraints, elections |
| Causal | Preserves causality, available | Weaker than linear | Social feeds, collaborative docs |
| Eventual | Highest availability/perf | No ordering guarantee | Analytics, caches, non-critical |
| Raft | Understandable, widely implemented | Leader bottleneck | Strongly consistent distributed state |
| Paxos | Proven correct, variants optimized | Complex, hard to implement | Foundation for Raft/Zab |
| 2PC | Atomic across nodes | Blocking on coordinator failure | Cross-DB atomic commit |
Red Flags
❌ Assuming you can have CAP “all three” (you can’t during partitions)
❌ Using linearizability everywhere (very expensive; often overkill)
❌ Implementing Paxos/Raft yourself (use ZooKeeper/etcd instead)
❌ 2PC coordinator with no failover plan (SPOF → blocking)
❌ Confusing CAP “consistency” with ACID “consistency” (different things!)
Green Flags
✅ Identify what consistency model each operation actually needs
✅ Use causal consistency where linearizability isn’t required
✅ Use etcd/ZooKeeper for distributed locks and leader election
✅ Understand which systems are CP vs AP for your use case
✅ Know the PACELC trade-off: even without partitions, strong consistency costs latency
Modern Additions (2026)
Raft dominance:
├─ etcd (Kubernetes), CockroachDB, TiDB, YugabyteDB all use Raft
├─ Kafka KRaft: eliminated ZooKeeper dependency (Kafka 3.3+)
└─ Raft now has decades of production hardening
Flexible Paxos:
├─ Read and write quorums can be different sizes (w + r > n still required)
├─ Optimizes for specific read/write ratios
└─ Used in some geo-distributed systems
Consensus for cloud coordination:
├─ AWS DynamoDB Global Tables: multi-region replication with conflict resolution
├─ Google Cloud Spanner: external consistency globally
└─ CockroachDB Serverless: managed consensus at global scale
Interview Response Templates
When Asked About CAP Theorem
“CAP says that during a network partition, you can’t have both perfect availability and linearizable consistency. But this is specific to partition scenarios — normally you have both. It’s really a choice between CP (refuse requests to stay consistent, like ZooKeeper) and AP (serve possibly stale data to stay available, like Cassandra). In practice the PACELC theorem is more useful: it also captures the latency-consistency trade-off even without partitions.”
When Asked About Consensus
“Consensus is fundamental — it underlies leader election, distributed locks, atomic commit, and total order broadcast. Raft is the standard implementation today: a leader accepts all writes, replicates to followers, and commits once a majority acknowledge. Once you have total order broadcast (all nodes see all messages in the same order), you can build almost any coordination primitive. Use etcd or ZooKeeper rather than implementing Raft yourself.”
Quick Revision Time: 5 minutes
Interview Prep: 15 minutes
Last Updated: 2026-04-13