Chapter 6 Cheat Sheet — Replication

One-Line Summaries

ConceptOne-Liner
ReplicationKeeping copies of data on multiple nodes for availability, fault tolerance, and read scale
Single-leaderAll writes to one node; followers receive changes; reads from any replica
Synchronous replicationLeader waits for follower confirmation; strong durability, reduced availability
Asynchronous replicationLeader acknowledges immediately; low latency, risk of data loss on failover
Replication lagFollowers behind leader; reads may return stale data
Read-after-writeUser always sees their own writes; route their reads to leader or track write position
Multi-leaderMultiple nodes accept writes; enables multi-datacenter; creates write conflicts
LeaderlessAny replica accepts writes; quorum ensures consistency (w+r>n)
LWWLast Write Wins — resolve conflicts by timestamp; risks data loss from clock skew
CRDTData type with mathematically provable merge semantics; conflict-free by design
CDCChange Data Capture — replicate DB changes as event stream for downstream consumers

Replication Topology Diagrams

SINGLE-LEADER:
                    Writes ──→ ┌──────────┐
                               │  LEADER  │
                               └────┬─────┘
                        replication │ log
                    ┌───────────────┴───────────────┐
                    ↓                               ↓
             ┌──────────┐                   ┌──────────┐
             │ Follower │                   │ Follower │
             │ (sync)   │                   │ (async)  │
             └──────────┘                   └──────────┘
             Reads ↑                        Reads ↑ (may be stale)

MULTI-LEADER (multi-datacenter):
  DC1: Leader A ←──────────────────────→ Leader B :DC2
         │   async cross-DC replication   │
         │ sync                           │ sync
      Follower A1                      Follower B1

LEADERLESS:
  Client writes to w=3 of n=5:        Client reads from r=3 of n=5:
        ┌───┐ ✓                              ┌───┐ ✓ (latest)
        │ A │                               │ A │
        │ B │ ✓                             │ B │ ✓ (latest)
        │ C │ ✓                             │ C │ ✓ (latest)
        │ D │ ×                             │ D │ ×
        │ E │ ×                             │ E │ ×
        └───┘                               └───┘
  w+r=6 > n=5 → overlap guaranteed → at least 1 replica has latest write

CIRCULAR TOPOLOGY (multi-leader):        STAR TOPOLOGY:          ALL-TO-ALL:
   A → B → C → A                             B                   A ↔ B ↔ C
   (ring; single failure breaks ring)       /|\                    \   /
                                           A-C-D                    ↕
                                             |
                                             E

Synchronous vs Asynchronous vs Semi-Synchronous

SynchronousSemi-SynchronousAsynchronous
Leader waits forAll configured followersOne followerNo followers
Write latencyHighestMediumLowest
DurabilityStrongestStrong (2 copies)Weak (lost on leader crash)
Write availabilityBlocked if any sync replica downOnly blocked if that one follower downNever blocked by replicas
Use caseSafety-critical (banking)PostgreSQL defaultHigh-throughput writes
Read consistencyAll sync replicas currentOne replica currentFollowers may be stale

Practical recommendation: Semi-synchronous (one synchronous follower + async for rest). PostgreSQL: synchronous_standby_names = 'follower1'.


Replication Log Methods

MethodHow It WorksDeterministic?Storage-Coupled?CDC-Friendly?
Statement-basedLog SQL statements; followers re-executeNo (NOW(), RAND())NoNo
WAL shippingShip raw WAL bytes; exact physical copyYesTight (version-sensitive)No
Logical/row-basedLog row-level changes decoupled from storageYesLoose (can differ versions)Yes
Trigger-basedDB triggers write to replication tableApp-controlledNoneYes (custom)

Best for CDC: Logical/row-based — readable by Debezium, Kafka Connect, etc.


Failover Failure Modes

PROBLEM 1: LOST WRITES
  Leader has writes W1, W2, W3
  Follower only received W1, W2
  Leader fails → Follower becomes new leader
  W3 is LOST (client got "success" but data is gone)
  
PROBLEM 2: SPLIT BRAIN
  Network partition → both old and new leader think they're primary
  Both accept writes → diverging state → data corruption
  Solution: STONITH (fencing) — force old leader offline

PROBLEM 3: FALSE FAILOVER
  Leader appears down (network hiccup) but is actually healthy
  New leader elected → old leader comes back → split brain
  Solution: Careful timeout tuning; fencing tokens

PROBLEM 4: CASCADING FAILURE
  Failover triggers massive replication catch-up on new leader
  New leader overloaded → appears slow → triggers another failover
  Solution: Gradual catch-up; circuit breakers

Quorum Formula Reference

n = total replicas
w = replicas that must confirm a write  
r = replicas that must respond to a read

Consistency guarantee:  w + r > n
Fault tolerance:        can tolerate floor((n-1)/2) failures while maintaining quorum

Common configurations:
┌───────────────────┬────────────────────────────────────────────────┐
│ n=3, w=2, r=2     │ Tolerates 1 failure; reads/writes both need 2 │
│ n=5, w=3, r=3     │ Tolerates 2 failures; balanced                │
│ n=5, w=5, r=1     │ Very durable writes; fast reads; writes blocked│
│                   │ if any node down                               │
│ n=5, w=1, r=5     │ Fast writes; reads see all; write availability │
│                   │ max; reads expensive                           │
│ n=5, w=3, r=1     │ w+r=4 < n=5 → NO consistency guarantee       │
│                   │ (eventual consistency only)                    │
└───────────────────┴────────────────────────────────────────────────┘

Sloppy quorum: write accepted by ANY w nodes (not necessarily home replicas)
Hinted handoff: temporary node forwards data to home node when it recovers
→ Higher availability, weaker durability (temporary node may fail first)

Consistency Levels (Weakest to Strongest)

EVENTUAL CONSISTENCY  ──────────────────────────────────────────────→ LINEARIZABILITY
    │                                                                       │
    │   Monotonic Reads      Consistent Prefix       Read-After-Write       │
    │   (no time reversal)   (causality preserved)   (see own writes)       │
    ↓                                                                       ↓
Most available                                                    Most consistent
Lowest latency                                                   Highest latency
LevelGuaranteeHow to AchieveCost
EventualReplicas converge if writes stopDefault in async replicationLowest
Consistent prefixIf A before B, readers see A before BLogical timestamps; causal writes to same partitionLow
Monotonic readsUser never sees time go backwardsSticky replica routing per sessionLow
Read-after-writeUser sees their own writesRoute user reads to leader; or track write LSNMedium
Linearizable (strong)Single-copy semantics; reads always see latest writeSynchronous quorum; consensus protocolHighest

Conflict Resolution Strategies

StrategyMechanismProsConsBest For
Conflict avoidanceRoute all writes for a key to one leaderNo conflicts; simpleFails during leader changeMost cases
LWW (Last Write Wins)Timestamp per write; highest winsSimple; eventually convergentData loss; clock skew riskAppend-only, immutable data
Highest replica IDNumeric IDs; higher ID winsSimpleSemantically meaninglessSimple datasets
Merge valuesStore all conflicting values; app mergesNo data lossApp must implement mergeShopping carts (union)
Custom logic on writeApplication callback on conflictFull semantic controlDeveloper burdenComplex business logic
CRDTMathematically defined merge operationAutomatic; provably correctOnly works for specific typesCounters, sets, text

CRDTs: Key Types and Uses

CRDT TypeData StructureUse Case
G-CounterGrow-only counterPage view count, like count
PN-Counter+/- counter (two G-Counters)Inventory adjustments
G-SetGrow-only set (add only)Tag sets, follow lists
OR-SetObserved-remove set (add + remove)Shopping cart, todo list
LWW-RegisterLast-write-wins registerUser preferences
RGAReplicated Growable ArrayCollaborative text editing
MV-RegisterMulti-value registerReturns all concurrent values

CDC Architecture

Source DB                 CDC Tool               Message Broker      Consumers
┌──────────┐  WAL/binlog  ┌───────────┐  events  ┌──────────┐  ───→ Elasticsearch
│PostgreSQL│─────────────→│  Debezium  │─────────→│  Kafka   │  ───→ Data Warehouse
│ (source) │              │(connector) │          │  topics  │  ───→ Redis cache
└──────────┘              └───────────┘          └──────────┘  ───→ Microservice B

CDC guarantees:
  - At-least-once delivery of every change
  - Before/after state for UPDATE operations
  - Transaction boundaries preserved (BEGIN/COMMIT)
  - Schema change events (DDL)

Use cases:
  ✓ Search index sync (DB → Elasticsearch)
  ✓ Cache invalidation (DB → Redis)
  ✓ Analytics pipeline (OLTP → Data Warehouse)
  ✓ Microservice data sync (avoid distributed transactions)
  ✓ Audit logging (every change captured)
  ✓ Event sourcing on top of relational DB

Replication Decision Tree

Do you need multi-region writes (users in multiple regions write concurrently)?
├─ YES → Multi-Leader or Leaderless
│  ├─ Can you tolerate conflict resolution complexity? 
│  │  ├─ YES, need geographic performance → Multi-Leader
│  │  └─ YES, need high availability → Leaderless (Cassandra, DynamoDB)
│  └─ NO → Consider single-leader with a primary region + async replicas
└─ NO → Single-Leader
   ├─ Need strong durability? → Semi-synchronous (one sync follower)
   ├─ Need maximum throughput? → Asynchronous
   └─ Need strong consistency? → Synchronous (accept higher latency)

Key Numbers and Rules of Thumb

  • Replication lag in healthy datacenter: <100ms typical; <1s for same region
  • Replication lag cross-region: 50-150ms (proportional to speed of light distance)
  • Failover timeout typical: 30-60 seconds (leader considered dead after this)
  • Semi-synchronous cost: 1-3ms per write (waiting for one nearby follower)
  • Cassandra quorum most common: n=3, w=2, r=2 (LOCAL_QUORUM)
  • Rule: w + r > n for consistency; w + r ≤ n for availability over consistency

Red Flags

  • Using asynchronous replication for data where “success” means “durable”
  • Ignoring split-brain risk in leader failover (no fencing mechanism)
  • Using LWW for data that should never be lost (e.g., financial transactions, inventory)
  • Silently discarding concurrent writes without conflict resolution strategy
  • Using statement-based replication with non-deterministic functions (NOW(), RAND())
  • Setting w+r ≤ n and claiming strong consistency
  • No CDC for keeping derived data systems in sync (polling alternative is slower and less reliable)

Green Flags

  • Semi-synchronous replication for critical data with acceptable write latency
  • Fencing/STONITH in failover automation to prevent split-brain
  • Logical/row-based replication for zero-downtime upgrades + CDC capability
  • CRDT usage for counters and sets that naturally need distributed update
  • Version vectors to detect and resolve concurrent writes correctly
  • CDC pipeline (Debezium → Kafka) for search, analytics, and cache sync
  • Monotonic reads via sticky session routing per user

Quick Revision Time: 5 minutes
Interview Prep: 20 minutes
Last Updated: 2026-05-29