Chapter 6 Cheat Sheet — Replication

One-Line Summaries

Concept	One-Liner
Replication	Keeping copies of data on multiple nodes for availability, fault tolerance, and read scale
Single-leader	All writes to one node; followers receive changes; reads from any replica
Synchronous replication	Leader waits for follower confirmation; strong durability, reduced availability
Asynchronous replication	Leader acknowledges immediately; low latency, risk of data loss on failover
Replication lag	Followers behind leader; reads may return stale data
Read-after-write	User always sees their own writes; route their reads to leader or track write position
Multi-leader	Multiple nodes accept writes; enables multi-datacenter; creates write conflicts
Leaderless	Any replica accepts writes; quorum ensures consistency (w+r>n)
LWW	Last Write Wins — resolve conflicts by timestamp; risks data loss from clock skew
CRDT	Data type with mathematically provable merge semantics; conflict-free by design
CDC	Change Data Capture — replicate DB changes as event stream for downstream consumers

Replication Topology Diagrams

SINGLE-LEADER:
                    Writes ──→ ┌──────────┐
                               │  LEADER  │
                               └────┬─────┘
                        replication │ log
                    ┌───────────────┴───────────────┐
                    ↓                               ↓
             ┌──────────┐                   ┌──────────┐
             │ Follower │                   │ Follower │
             │ (sync)   │                   │ (async)  │
             └──────────┘                   └──────────┘
             Reads ↑                        Reads ↑ (may be stale)

MULTI-LEADER (multi-datacenter):
  DC1: Leader A ←──────────────────────→ Leader B :DC2
         │   async cross-DC replication   │
         │ sync                           │ sync
      Follower A1                      Follower B1

LEADERLESS:
  Client writes to w=3 of n=5:        Client reads from r=3 of n=5:
        ┌───┐ ✓                              ┌───┐ ✓ (latest)
        │ A │                               │ A │
        │ B │ ✓                             │ B │ ✓ (latest)
        │ C │ ✓                             │ C │ ✓ (latest)
        │ D │ ×                             │ D │ ×
        │ E │ ×                             │ E │ ×
        └───┘                               └───┘
  w+r=6 > n=5 → overlap guaranteed → at least 1 replica has latest write

CIRCULAR TOPOLOGY (multi-leader):        STAR TOPOLOGY:          ALL-TO-ALL:
   A → B → C → A                             B                   A ↔ B ↔ C
   (ring; single failure breaks ring)       /|\                    \   /
                                           A-C-D                    ↕
                                             |
                                             E

Synchronous vs Asynchronous vs Semi-Synchronous

	Synchronous	Semi-Synchronous	Asynchronous
Leader waits for	All configured followers	One follower	No followers
Write latency	Highest	Medium	Lowest
Durability	Strongest	Strong (2 copies)	Weak (lost on leader crash)
Write availability	Blocked if any sync replica down	Only blocked if that one follower down	Never blocked by replicas
Use case	Safety-critical (banking)	PostgreSQL default	High-throughput writes
Read consistency	All sync replicas current	One replica current	Followers may be stale

Practical recommendation: Semi-synchronous (one synchronous follower + async for rest). PostgreSQL: synchronous_standby_names = 'follower1'.

Replication Log Methods

Method	How It Works	Deterministic?	Storage-Coupled?	CDC-Friendly?
Statement-based	Log SQL statements; followers re-execute	No (NOW(), RAND())	No	No
WAL shipping	Ship raw WAL bytes; exact physical copy	Yes	Tight (version-sensitive)	No
Logical/row-based	Log row-level changes decoupled from storage	Yes	Loose (can differ versions)	Yes
Trigger-based	DB triggers write to replication table	App-controlled	None	Yes (custom)

Best for CDC: Logical/row-based — readable by Debezium, Kafka Connect, etc.

Failover Failure Modes

PROBLEM 1: LOST WRITES
  Leader has writes W1, W2, W3
  Follower only received W1, W2
  Leader fails → Follower becomes new leader
  W3 is LOST (client got "success" but data is gone)
  
PROBLEM 2: SPLIT BRAIN
  Network partition → both old and new leader think they're primary
  Both accept writes → diverging state → data corruption
  Solution: STONITH (fencing) — force old leader offline

PROBLEM 3: FALSE FAILOVER
  Leader appears down (network hiccup) but is actually healthy
  New leader elected → old leader comes back → split brain
  Solution: Careful timeout tuning; fencing tokens

PROBLEM 4: CASCADING FAILURE
  Failover triggers massive replication catch-up on new leader
  New leader overloaded → appears slow → triggers another failover
  Solution: Gradual catch-up; circuit breakers

Quorum Formula Reference

n = total replicas
w = replicas that must confirm a write  
r = replicas that must respond to a read

Consistency guarantee:  w + r > n
Fault tolerance:        can tolerate floor((n-1)/2) failures while maintaining quorum

Common configurations:
┌───────────────────┬────────────────────────────────────────────────┐
│ n=3, w=2, r=2     │ Tolerates 1 failure; reads/writes both need 2 │
│ n=5, w=3, r=3     │ Tolerates 2 failures; balanced                │
│ n=5, w=5, r=1     │ Very durable writes; fast reads; writes blocked│
│                   │ if any node down                               │
│ n=5, w=1, r=5     │ Fast writes; reads see all; write availability │
│                   │ max; reads expensive                           │
│ n=5, w=3, r=1     │ w+r=4 < n=5 → NO consistency guarantee       │
│                   │ (eventual consistency only)                    │
└───────────────────┴────────────────────────────────────────────────┘

Sloppy quorum: write accepted by ANY w nodes (not necessarily home replicas)
Hinted handoff: temporary node forwards data to home node when it recovers
→ Higher availability, weaker durability (temporary node may fail first)

Consistency Levels (Weakest to Strongest)

EVENTUAL CONSISTENCY  ──────────────────────────────────────────────→ LINEARIZABILITY
    │                                                                       │
    │   Monotonic Reads      Consistent Prefix       Read-After-Write       │
    │   (no time reversal)   (causality preserved)   (see own writes)       │
    ↓                                                                       ↓
Most available                                                    Most consistent
Lowest latency                                                   Highest latency

Level	Guarantee	How to Achieve	Cost
Eventual	Replicas converge if writes stop	Default in async replication	Lowest
Consistent prefix	If A before B, readers see A before B	Logical timestamps; causal writes to same partition	Low
Monotonic reads	User never sees time go backwards	Sticky replica routing per session	Low
Read-after-write	User sees their own writes	Route user reads to leader; or track write LSN	Medium
Linearizable (strong)	Single-copy semantics; reads always see latest write	Synchronous quorum; consensus protocol	Highest

Conflict Resolution Strategies

Strategy	Mechanism	Pros	Cons	Best For
Conflict avoidance	Route all writes for a key to one leader	No conflicts; simple	Fails during leader change	Most cases
LWW (Last Write Wins)	Timestamp per write; highest wins	Simple; eventually convergent	Data loss; clock skew risk	Append-only, immutable data
Highest replica ID	Numeric IDs; higher ID wins	Simple	Semantically meaningless	Simple datasets
Merge values	Store all conflicting values; app merges	No data loss	App must implement merge	Shopping carts (union)
Custom logic on write	Application callback on conflict	Full semantic control	Developer burden	Complex business logic
CRDT	Mathematically defined merge operation	Automatic; provably correct	Only works for specific types	Counters, sets, text

CRDTs: Key Types and Uses

CRDT Type	Data Structure	Use Case
G-Counter	Grow-only counter	Page view count, like count
PN-Counter	+/- counter (two G-Counters)	Inventory adjustments
G-Set	Grow-only set (add only)	Tag sets, follow lists
OR-Set	Observed-remove set (add + remove)	Shopping cart, todo list
LWW-Register	Last-write-wins register	User preferences
RGA	Replicated Growable Array	Collaborative text editing
MV-Register	Multi-value register	Returns all concurrent values

CDC Architecture

Source DB                 CDC Tool               Message Broker      Consumers
┌──────────┐  WAL/binlog  ┌───────────┐  events  ┌──────────┐  ───→ Elasticsearch
│PostgreSQL│─────────────→│  Debezium  │─────────→│  Kafka   │  ───→ Data Warehouse
│ (source) │              │(connector) │          │  topics  │  ───→ Redis cache
└──────────┘              └───────────┘          └──────────┘  ───→ Microservice B

CDC guarantees:
  - At-least-once delivery of every change
  - Before/after state for UPDATE operations
  - Transaction boundaries preserved (BEGIN/COMMIT)
  - Schema change events (DDL)

Use cases:
  ✓ Search index sync (DB → Elasticsearch)
  ✓ Cache invalidation (DB → Redis)
  ✓ Analytics pipeline (OLTP → Data Warehouse)
  ✓ Microservice data sync (avoid distributed transactions)
  ✓ Audit logging (every change captured)
  ✓ Event sourcing on top of relational DB

Replication Decision Tree

Do you need multi-region writes (users in multiple regions write concurrently)?
├─ YES → Multi-Leader or Leaderless
│  ├─ Can you tolerate conflict resolution complexity? 
│  │  ├─ YES, need geographic performance → Multi-Leader
│  │  └─ YES, need high availability → Leaderless (Cassandra, DynamoDB)
│  └─ NO → Consider single-leader with a primary region + async replicas
└─ NO → Single-Leader
   ├─ Need strong durability? → Semi-synchronous (one sync follower)
   ├─ Need maximum throughput? → Asynchronous
   └─ Need strong consistency? → Synchronous (accept higher latency)

Key Numbers and Rules of Thumb

Replication lag in healthy datacenter: <100ms typical; <1s for same region
Replication lag cross-region: 50-150ms (proportional to speed of light distance)
Failover timeout typical: 30-60 seconds (leader considered dead after this)
Semi-synchronous cost: 1-3ms per write (waiting for one nearby follower)
Cassandra quorum most common: n=3, w=2, r=2 (LOCAL_QUORUM)
Rule: w + r > n for consistency; w + r ≤ n for availability over consistency

Red Flags

Using asynchronous replication for data where “success” means “durable”
Ignoring split-brain risk in leader failover (no fencing mechanism)
Using LWW for data that should never be lost (e.g., financial transactions, inventory)
Silently discarding concurrent writes without conflict resolution strategy
Using statement-based replication with non-deterministic functions (NOW(), RAND())
Setting w+r ≤ n and claiming strong consistency
No CDC for keeping derived data systems in sync (polling alternative is slower and less reliable)

Green Flags

Semi-synchronous replication for critical data with acceptable write latency
Fencing/STONITH in failover automation to prevent split-brain
Logical/row-based replication for zero-downtime upgrades + CDC capability
CRDT usage for counters and sets that naturally need distributed update
Version vectors to detect and resolve concurrent writes correctly
CDC pipeline (Debezium → Kafka) for search, analytics, and cache sync
Monotonic reads via sticky session routing per user

Quick Revision Time: 5 minutes
Interview Prep: 20 minutes
Last Updated: 2026-05-29

Study Notes by Niladri & AI

Explorer

ch06-cheatsheet