Chapter 5 Cheat Sheet - Replication

One-Line Summaries

ConceptOne-Liner
Single-leaderOne node accepts writes; followers replicate and serve reads
Multi-leaderMultiple nodes accept writes; sync async; risk of write conflicts
LeaderlessAll nodes accept writes; quorum determines consistency
Replication lagDelay between leader write and follower update; causes stale reads
Eventual consistencyReplicas converge if writes stop; no guarantee during lag
Quorum (w+r>n)Overlapping read/write sets guarantee at least one up-to-date response
Split brainTwo nodes both think they’re leader → data corruption
Read-your-writesGuarantee user sees their own writes, even from stale replicas
LWWLast Write Wins — highest timestamp wins; risks data loss
CRDTConflict-free Replicated Data Type — merges automatically

Replication Architecture Comparison

Single-LeaderMulti-LeaderLeaderless
Write targetOne leaderAny leaderAny replica (quorum)
Write conflictsNonePossiblePossible
AvailabilityLeader failure = writes downTolerates per-DC failureHigh (no single point)
ConsistencyStrongestWeakestTunable (w+r)
LatencyLow local, high cross-DCLow (local DC)Low (quorum)
ExamplesMySQL, PostgreSQL, MongoDBMySQL multi-source, CouchDBCassandra, DynamoDB, Riak

Failover Steps (Single-Leader)

1. Detect leader failure
   └─ Timeout-based: no heartbeat for 30s → presumed dead

2. Elect new leader
   └─ Most up-to-date follower (by replication offset)
   └─ Consensus required (avoid split-brain)

3. Reconfigure system
   ├─ Clients redirect writes to new leader
   └─ Other followers follow new leader

PROBLEMS:
├─ Split brain: old leader comes back → two leaders → data corruption
│  └─ Solution: STONITH (fencing) — force old leader offline
├─ Replication lag: new leader may have lost writes
│  └─ Solution: Accept the loss (compromise) or synchronous replication
└─ Wrong timeout: too short = false failover; too long = slow recovery

Replication Lag Consistency Guarantees

Problem 1: Reading Your Own Writes
─────────────────────────────────
User posts comment → reads own comment from stale follower → comment gone!

Fix:
├─ Read own recent writes from LEADER
├─ Track last write timestamp; don't read replica lagging behind that time
└─ Route same user's reads to same replica

Problem 2: Monotonic Reads
──────────────────────────
User sees messages in order A→B→C from replica 1
User refreshes → hits replica 2 (more stale) → sees only A
"Time goes backward"

Fix:
└─ Route same user to same replica (sticky sessions)

Problem 3: Consistent Prefix Reads
────────────────────────────────────
User A posts question, User B posts answer
Observer sees answer before question (different partitions, different lag)

Fix:
└─ Causally related writes to same partition

Quorum Reads/Writes

n = total replicas
w = nodes that must confirm write
r = nodes queried for read

Rule: w + r > n → at least 1 overlap → guaranteed up-to-date read

Example: n=5, w=3, r=3
  Write to 3: ●●●○○
  Read from 3: ●●●○○  (or any 3)
  Overlap: at least 1 of the 3 read nodes was in the 3 write nodes

Common configs:
  n=3, w=2, r=2   → Typical balanced config
  n=3, w=3, r=1   → Strong write, fast read
  n=3, w=1, r=3   → Fast write, slow read

Write Conflict Resolution Strategies

Strategy          | How it works              | Data loss? | Use case
─────────────────────────────────────────────────────────────────────
Last Write Wins   | Highest timestamp wins    | YES        | Caching, low-value data
(LWW)             |                           |            |
                  |                           |            |
Merge             | Combine both values       | NO         | Sets, counters (CRDTs)
(CRDT)            | (union, sum, etc.)        |            |
                  |                           |            |
Explicit tracking | Store all versions;       | NO         | User-facing edits
                  | show conflict to user/app |            |
                  |                           |            |
Causal ordering   | Use version vectors to    | NO         | Any causal workflow
(vector clocks)   | determine "happened-before"|            |

Multi-Leader Topologies

All-to-all (most resilient):     Star (centralized):
  DC1 ←──→ DC2                     DC1 → DC3 ← DC2
   ↑  ↗  ↙  ↑                              ↑
   DC3                                      DC3

Circular (ordered):
  DC1 → DC2 → DC3 → DC1

Risk in circular/star: if one node is slow, it creates replication lag for the whole topology

Concurrent Write Detection

Version vectors (vector clocks):
  Replica A: {A:3, B:2}  = "I've seen 3 writes on A, 2 writes on B"
  Replica B: {A:2, B:4}  = "I've seen 2 writes on A, 4 writes on B"

Comparison:
  A.A=3 > B.A=2  (A is ahead on its own writes)
  A.B=2 < B.B=4  (B is ahead on its own writes)
  Neither dominates → CONCURRENT WRITE → conflict!

If A.all >= B.all → A is a descendant of B → safe to overwrite

Key Trade-offs

DecisionProConWhen to Use
Sync replicationNo data loss on failoverWrites blocked if replica slowCritical data
Async replicationHigh write throughputPotential data loss on failoverHigh volume, tolerable loss
Multi-leaderLow write latency, DC redundancyWrite conflicts, complex resolutionMultiple DCs, offline clients
LeaderlessHigh availability, no failover neededWeak consistency, quorum costAvailability > consistency
LWW conflict resSimpleData lossLow-value, cache-like data
CRDT conflict resNo data loss, auto-mergeComplex data structuresCollaborative editing, counters

Red Flags

❌ Assuming replication is instant (lag can be seconds or minutes under load)
❌ No split-brain protection (STONITH/fencing)
❌ Using LWW where data loss is unacceptable
❌ No read-after-write consistency for user-facing writes
❌ Multi-leader without a conflict resolution plan

Green Flags

✅ Semi-synchronous replication (one sync follower + rest async)
✅ Monotonic reads: route same user to same replica
✅ Explicit conflict detection + resolution strategy in place
✅ Quorum settings tuned for workload (more reads: lower w, higher r)
✅ Replication lag monitoring with alerting

Modern Additions (2026)

Raft consensus algorithm:
├─ Used in etcd, TiKV, CockroachDB, Consul
├─ Clean leader election + log replication in one protocol
└─ Replaced bespoke Paxos implementations

CRDTs in production:
├─ Riak, Redis Cluster, Figma, Linear, Notion
└─ Enable real-time collaborative editing without conflicts

Cloud-native replication:
├─ Aurora: shared storage, 6-way replication across 3 AZs
├─ AlloyDB: pooled write buffers, instant read replica catch-up
└─ PlanetScale: Vitess-based MySQL with online schema changes

Interview Response Templates

When Asked About Replication Strategy

“I’d start with single-leader replication for simplicity — one node accepts writes, read replicas handle reads. For a global application, I’d add multi-leader (one leader per region) to reduce write latency, with CRDTs or application-level conflict resolution. If availability is paramount over consistency, leaderless with quorum reads/writes (like Cassandra’s n=3, w=2, r=2) gives high availability at the cost of weaker consistency guarantees.”

When Asked About Handling Replication Lag

“Replication lag creates three classic problems: reading your own writes (fix: route recent writes to leader), monotonic reads (fix: sticky routing to same replica), and consistent prefix reads (fix: causally related writes to same partition). The right fix depends on what consistency guarantee the feature needs — user-facing features usually need at least read-your-writes consistency.”


Quick Revision Time: 5 minutes
Interview Prep: 15 minutes
Last Updated: 2026-04-13