Chapter 5 Cheat Sheet - Replication
One-Line Summaries
| Concept | One-Liner |
|---|---|
| Single-leader | One node accepts writes; followers replicate and serve reads |
| Multi-leader | Multiple nodes accept writes; sync async; risk of write conflicts |
| Leaderless | All nodes accept writes; quorum determines consistency |
| Replication lag | Delay between leader write and follower update; causes stale reads |
| Eventual consistency | Replicas converge if writes stop; no guarantee during lag |
| Quorum (w+r>n) | Overlapping read/write sets guarantee at least one up-to-date response |
| Split brain | Two nodes both think they’re leader → data corruption |
| Read-your-writes | Guarantee user sees their own writes, even from stale replicas |
| LWW | Last Write Wins — highest timestamp wins; risks data loss |
| CRDT | Conflict-free Replicated Data Type — merges automatically |
Replication Architecture Comparison
| Single-Leader | Multi-Leader | Leaderless | |
|---|---|---|---|
| Write target | One leader | Any leader | Any replica (quorum) |
| Write conflicts | None | Possible | Possible |
| Availability | Leader failure = writes down | Tolerates per-DC failure | High (no single point) |
| Consistency | Strongest | Weakest | Tunable (w+r) |
| Latency | Low local, high cross-DC | Low (local DC) | Low (quorum) |
| Examples | MySQL, PostgreSQL, MongoDB | MySQL multi-source, CouchDB | Cassandra, DynamoDB, Riak |
Failover Steps (Single-Leader)
1. Detect leader failure
└─ Timeout-based: no heartbeat for 30s → presumed dead
2. Elect new leader
└─ Most up-to-date follower (by replication offset)
└─ Consensus required (avoid split-brain)
3. Reconfigure system
├─ Clients redirect writes to new leader
└─ Other followers follow new leader
PROBLEMS:
├─ Split brain: old leader comes back → two leaders → data corruption
│ └─ Solution: STONITH (fencing) — force old leader offline
├─ Replication lag: new leader may have lost writes
│ └─ Solution: Accept the loss (compromise) or synchronous replication
└─ Wrong timeout: too short = false failover; too long = slow recovery
Replication Lag Consistency Guarantees
Problem 1: Reading Your Own Writes
─────────────────────────────────
User posts comment → reads own comment from stale follower → comment gone!
Fix:
├─ Read own recent writes from LEADER
├─ Track last write timestamp; don't read replica lagging behind that time
└─ Route same user's reads to same replica
Problem 2: Monotonic Reads
──────────────────────────
User sees messages in order A→B→C from replica 1
User refreshes → hits replica 2 (more stale) → sees only A
"Time goes backward"
Fix:
└─ Route same user to same replica (sticky sessions)
Problem 3: Consistent Prefix Reads
────────────────────────────────────
User A posts question, User B posts answer
Observer sees answer before question (different partitions, different lag)
Fix:
└─ Causally related writes to same partition
Quorum Reads/Writes
n = total replicas
w = nodes that must confirm write
r = nodes queried for read
Rule: w + r > n → at least 1 overlap → guaranteed up-to-date read
Example: n=5, w=3, r=3
Write to 3: ●●●○○
Read from 3: ●●●○○ (or any 3)
Overlap: at least 1 of the 3 read nodes was in the 3 write nodes
Common configs:
n=3, w=2, r=2 → Typical balanced config
n=3, w=3, r=1 → Strong write, fast read
n=3, w=1, r=3 → Fast write, slow read
Write Conflict Resolution Strategies
Strategy | How it works | Data loss? | Use case
─────────────────────────────────────────────────────────────────────
Last Write Wins | Highest timestamp wins | YES | Caching, low-value data
(LWW) | | |
| | |
Merge | Combine both values | NO | Sets, counters (CRDTs)
(CRDT) | (union, sum, etc.) | |
| | |
Explicit tracking | Store all versions; | NO | User-facing edits
| show conflict to user/app | |
| | |
Causal ordering | Use version vectors to | NO | Any causal workflow
(vector clocks) | determine "happened-before"| |
Multi-Leader Topologies
All-to-all (most resilient): Star (centralized):
DC1 ←──→ DC2 DC1 → DC3 ← DC2
↑ ↗ ↙ ↑ ↑
DC3 DC3
Circular (ordered):
DC1 → DC2 → DC3 → DC1
Risk in circular/star: if one node is slow, it creates replication lag for the whole topology
Concurrent Write Detection
Version vectors (vector clocks):
Replica A: {A:3, B:2} = "I've seen 3 writes on A, 2 writes on B"
Replica B: {A:2, B:4} = "I've seen 2 writes on A, 4 writes on B"
Comparison:
A.A=3 > B.A=2 (A is ahead on its own writes)
A.B=2 < B.B=4 (B is ahead on its own writes)
Neither dominates → CONCURRENT WRITE → conflict!
If A.all >= B.all → A is a descendant of B → safe to overwrite
Key Trade-offs
| Decision | Pro | Con | When to Use |
|---|---|---|---|
| Sync replication | No data loss on failover | Writes blocked if replica slow | Critical data |
| Async replication | High write throughput | Potential data loss on failover | High volume, tolerable loss |
| Multi-leader | Low write latency, DC redundancy | Write conflicts, complex resolution | Multiple DCs, offline clients |
| Leaderless | High availability, no failover needed | Weak consistency, quorum cost | Availability > consistency |
| LWW conflict res | Simple | Data loss | Low-value, cache-like data |
| CRDT conflict res | No data loss, auto-merge | Complex data structures | Collaborative editing, counters |
Red Flags
❌ Assuming replication is instant (lag can be seconds or minutes under load)
❌ No split-brain protection (STONITH/fencing)
❌ Using LWW where data loss is unacceptable
❌ No read-after-write consistency for user-facing writes
❌ Multi-leader without a conflict resolution plan
Green Flags
✅ Semi-synchronous replication (one sync follower + rest async)
✅ Monotonic reads: route same user to same replica
✅ Explicit conflict detection + resolution strategy in place
✅ Quorum settings tuned for workload (more reads: lower w, higher r)
✅ Replication lag monitoring with alerting
Modern Additions (2026)
Raft consensus algorithm:
├─ Used in etcd, TiKV, CockroachDB, Consul
├─ Clean leader election + log replication in one protocol
└─ Replaced bespoke Paxos implementations
CRDTs in production:
├─ Riak, Redis Cluster, Figma, Linear, Notion
└─ Enable real-time collaborative editing without conflicts
Cloud-native replication:
├─ Aurora: shared storage, 6-way replication across 3 AZs
├─ AlloyDB: pooled write buffers, instant read replica catch-up
└─ PlanetScale: Vitess-based MySQL with online schema changes
Interview Response Templates
When Asked About Replication Strategy
“I’d start with single-leader replication for simplicity — one node accepts writes, read replicas handle reads. For a global application, I’d add multi-leader (one leader per region) to reduce write latency, with CRDTs or application-level conflict resolution. If availability is paramount over consistency, leaderless with quorum reads/writes (like Cassandra’s n=3, w=2, r=2) gives high availability at the cost of weaker consistency guarantees.”
When Asked About Handling Replication Lag
“Replication lag creates three classic problems: reading your own writes (fix: route recent writes to leader), monotonic reads (fix: sticky routing to same replica), and consistent prefix reads (fix: causally related writes to same partition). The right fix depends on what consistency guarantee the feature needs — user-facing features usually need at least read-your-writes consistency.”
Quick Revision Time: 5 minutes
Interview Prep: 15 minutes
Last Updated: 2026-04-13