Chapter 5 Cheat Sheet - Replication

One-Line Summaries

Concept	One-Liner
Single-leader	One node accepts writes; followers replicate and serve reads
Multi-leader	Multiple nodes accept writes; sync async; risk of write conflicts
Leaderless	All nodes accept writes; quorum determines consistency
Replication lag	Delay between leader write and follower update; causes stale reads
Eventual consistency	Replicas converge if writes stop; no guarantee during lag
Quorum (w+r>n)	Overlapping read/write sets guarantee at least one up-to-date response
Split brain	Two nodes both think they’re leader → data corruption
Read-your-writes	Guarantee user sees their own writes, even from stale replicas
LWW	Last Write Wins — highest timestamp wins; risks data loss
CRDT	Conflict-free Replicated Data Type — merges automatically

Replication Architecture Comparison

	Single-Leader	Multi-Leader	Leaderless
Write target	One leader	Any leader	Any replica (quorum)
Write conflicts	None	Possible	Possible
Availability	Leader failure = writes down	Tolerates per-DC failure	High (no single point)
Consistency	Strongest	Weakest	Tunable (w+r)
Latency	Low local, high cross-DC	Low (local DC)	Low (quorum)
Examples	MySQL, PostgreSQL, MongoDB	MySQL multi-source, CouchDB	Cassandra, DynamoDB, Riak

Failover Steps (Single-Leader)

1. Detect leader failure
   └─ Timeout-based: no heartbeat for 30s → presumed dead

2. Elect new leader
   └─ Most up-to-date follower (by replication offset)
   └─ Consensus required (avoid split-brain)

3. Reconfigure system
   ├─ Clients redirect writes to new leader
   └─ Other followers follow new leader

PROBLEMS:
├─ Split brain: old leader comes back → two leaders → data corruption
│  └─ Solution: STONITH (fencing) — force old leader offline
├─ Replication lag: new leader may have lost writes
│  └─ Solution: Accept the loss (compromise) or synchronous replication
└─ Wrong timeout: too short = false failover; too long = slow recovery

Replication Lag Consistency Guarantees

Problem 1: Reading Your Own Writes
─────────────────────────────────
User posts comment → reads own comment from stale follower → comment gone!

Fix:
├─ Read own recent writes from LEADER
├─ Track last write timestamp; don't read replica lagging behind that time
└─ Route same user's reads to same replica

Problem 2: Monotonic Reads
──────────────────────────
User sees messages in order A→B→C from replica 1
User refreshes → hits replica 2 (more stale) → sees only A
"Time goes backward"

Fix:
└─ Route same user to same replica (sticky sessions)

Problem 3: Consistent Prefix Reads
────────────────────────────────────
User A posts question, User B posts answer
Observer sees answer before question (different partitions, different lag)

Fix:
└─ Causally related writes to same partition

Quorum Reads/Writes

n = total replicas
w = nodes that must confirm write
r = nodes queried for read

Rule: w + r > n → at least 1 overlap → guaranteed up-to-date read

Example: n=5, w=3, r=3
  Write to 3: ●●●○○
  Read from 3: ●●●○○  (or any 3)
  Overlap: at least 1 of the 3 read nodes was in the 3 write nodes

Common configs:
  n=3, w=2, r=2   → Typical balanced config
  n=3, w=3, r=1   → Strong write, fast read
  n=3, w=1, r=3   → Fast write, slow read

Write Conflict Resolution Strategies

Strategy          | How it works              | Data loss? | Use case
─────────────────────────────────────────────────────────────────────
Last Write Wins   | Highest timestamp wins    | YES        | Caching, low-value data
(LWW)             |                           |            |
                  |                           |            |
Merge             | Combine both values       | NO         | Sets, counters (CRDTs)
(CRDT)            | (union, sum, etc.)        |            |
                  |                           |            |
Explicit tracking | Store all versions;       | NO         | User-facing edits
                  | show conflict to user/app |            |
                  |                           |            |
Causal ordering   | Use version vectors to    | NO         | Any causal workflow
(vector clocks)   | determine "happened-before"|            |

Multi-Leader Topologies

All-to-all (most resilient):     Star (centralized):
  DC1 ←──→ DC2                     DC1 → DC3 ← DC2
   ↑  ↗  ↙  ↑                              ↑
   DC3                                      DC3

Circular (ordered):
  DC1 → DC2 → DC3 → DC1

Risk in circular/star: if one node is slow, it creates replication lag for the whole topology

Concurrent Write Detection

Version vectors (vector clocks):
  Replica A: {A:3, B:2}  = "I've seen 3 writes on A, 2 writes on B"
  Replica B: {A:2, B:4}  = "I've seen 2 writes on A, 4 writes on B"

Comparison:
  A.A=3 > B.A=2  (A is ahead on its own writes)
  A.B=2 < B.B=4  (B is ahead on its own writes)
  Neither dominates → CONCURRENT WRITE → conflict!

If A.all >= B.all → A is a descendant of B → safe to overwrite

Key Trade-offs

Decision	Pro	Con	When to Use
Sync replication	No data loss on failover	Writes blocked if replica slow	Critical data
Async replication	High write throughput	Potential data loss on failover	High volume, tolerable loss
Multi-leader	Low write latency, DC redundancy	Write conflicts, complex resolution	Multiple DCs, offline clients
Leaderless	High availability, no failover needed	Weak consistency, quorum cost	Availability > consistency
LWW conflict res	Simple	Data loss	Low-value, cache-like data
CRDT conflict res	No data loss, auto-merge	Complex data structures	Collaborative editing, counters

Red Flags

❌ Assuming replication is instant (lag can be seconds or minutes under load)
❌ No split-brain protection (STONITH/fencing)
❌ Using LWW where data loss is unacceptable
❌ No read-after-write consistency for user-facing writes
❌ Multi-leader without a conflict resolution plan

Green Flags

✅ Semi-synchronous replication (one sync follower + rest async)
✅ Monotonic reads: route same user to same replica
✅ Explicit conflict detection + resolution strategy in place
✅ Quorum settings tuned for workload (more reads: lower w, higher r)
✅ Replication lag monitoring with alerting

Modern Additions (2026)

Raft consensus algorithm:
├─ Used in etcd, TiKV, CockroachDB, Consul
├─ Clean leader election + log replication in one protocol
└─ Replaced bespoke Paxos implementations

CRDTs in production:
├─ Riak, Redis Cluster, Figma, Linear, Notion
└─ Enable real-time collaborative editing without conflicts

Cloud-native replication:
├─ Aurora: shared storage, 6-way replication across 3 AZs
├─ AlloyDB: pooled write buffers, instant read replica catch-up
└─ PlanetScale: Vitess-based MySQL with online schema changes

Interview Response Templates

When Asked About Replication Strategy

“I’d start with single-leader replication for simplicity — one node accepts writes, read replicas handle reads. For a global application, I’d add multi-leader (one leader per region) to reduce write latency, with CRDTs or application-level conflict resolution. If availability is paramount over consistency, leaderless with quorum reads/writes (like Cassandra’s n=3, w=2, r=2) gives high availability at the cost of weaker consistency guarantees.”

When Asked About Handling Replication Lag

“Replication lag creates three classic problems: reading your own writes (fix: route recent writes to leader), monotonic reads (fix: sticky routing to same replica), and consistent prefix reads (fix: causally related writes to same partition). The right fix depends on what consistency guarantee the feature needs — user-facing features usually need at least read-your-writes consistency.”

Quick Revision Time: 5 minutes
Interview Prep: 15 minutes
Last Updated: 2026-04-13

Study Notes by Niladri & AI

Explorer

ch05-cheatsheet