Best for CDC: Logical/row-based — readable by Debezium, Kafka Connect, etc.
Failover Failure Modes
PROBLEM 1: LOST WRITES
Leader has writes W1, W2, W3
Follower only received W1, W2
Leader fails → Follower becomes new leader
W3 is LOST (client got "success" but data is gone)
PROBLEM 2: SPLIT BRAIN
Network partition → both old and new leader think they're primary
Both accept writes → diverging state → data corruption
Solution: STONITH (fencing) — force old leader offline
PROBLEM 3: FALSE FAILOVER
Leader appears down (network hiccup) but is actually healthy
New leader elected → old leader comes back → split brain
Solution: Careful timeout tuning; fencing tokens
PROBLEM 4: CASCADING FAILURE
Failover triggers massive replication catch-up on new leader
New leader overloaded → appears slow → triggers another failover
Solution: Gradual catch-up; circuit breakers
Quorum Formula Reference
n = total replicas
w = replicas that must confirm a write
r = replicas that must respond to a read
Consistency guarantee: w + r > n
Fault tolerance: can tolerate floor((n-1)/2) failures while maintaining quorum
Common configurations:
┌───────────────────┬────────────────────────────────────────────────┐
│ n=3, w=2, r=2 │ Tolerates 1 failure; reads/writes both need 2 │
│ n=5, w=3, r=3 │ Tolerates 2 failures; balanced │
│ n=5, w=5, r=1 │ Very durable writes; fast reads; writes blocked│
│ │ if any node down │
│ n=5, w=1, r=5 │ Fast writes; reads see all; write availability │
│ │ max; reads expensive │
│ n=5, w=3, r=1 │ w+r=4 < n=5 → NO consistency guarantee │
│ │ (eventual consistency only) │
└───────────────────┴────────────────────────────────────────────────┘
Sloppy quorum: write accepted by ANY w nodes (not necessarily home replicas)
Hinted handoff: temporary node forwards data to home node when it recovers
→ Higher availability, weaker durability (temporary node may fail first)
Consistency Levels (Weakest to Strongest)
EVENTUAL CONSISTENCY ──────────────────────────────────────────────→ LINEARIZABILITY
│ │
│ Monotonic Reads Consistent Prefix Read-After-Write │
│ (no time reversal) (causality preserved) (see own writes) │
↓ ↓
Most available Most consistent
Lowest latency Highest latency
Level
Guarantee
How to Achieve
Cost
Eventual
Replicas converge if writes stop
Default in async replication
Lowest
Consistent prefix
If A before B, readers see A before B
Logical timestamps; causal writes to same partition
Low
Monotonic reads
User never sees time go backwards
Sticky replica routing per session
Low
Read-after-write
User sees their own writes
Route user reads to leader; or track write LSN
Medium
Linearizable (strong)
Single-copy semantics; reads always see latest write
Synchronous quorum; consensus protocol
Highest
Conflict Resolution Strategies
Strategy
Mechanism
Pros
Cons
Best For
Conflict avoidance
Route all writes for a key to one leader
No conflicts; simple
Fails during leader change
Most cases
LWW (Last Write Wins)
Timestamp per write; highest wins
Simple; eventually convergent
Data loss; clock skew risk
Append-only, immutable data
Highest replica ID
Numeric IDs; higher ID wins
Simple
Semantically meaningless
Simple datasets
Merge values
Store all conflicting values; app merges
No data loss
App must implement merge
Shopping carts (union)
Custom logic on write
Application callback on conflict
Full semantic control
Developer burden
Complex business logic
CRDT
Mathematically defined merge operation
Automatic; provably correct
Only works for specific types
Counters, sets, text
CRDTs: Key Types and Uses
CRDT Type
Data Structure
Use Case
G-Counter
Grow-only counter
Page view count, like count
PN-Counter
+/- counter (two G-Counters)
Inventory adjustments
G-Set
Grow-only set (add only)
Tag sets, follow lists
OR-Set
Observed-remove set (add + remove)
Shopping cart, todo list
LWW-Register
Last-write-wins register
User preferences
RGA
Replicated Growable Array
Collaborative text editing
MV-Register
Multi-value register
Returns all concurrent values
CDC Architecture
Source DB CDC Tool Message Broker Consumers
┌──────────┐ WAL/binlog ┌───────────┐ events ┌──────────┐ ───→ Elasticsearch
│PostgreSQL│─────────────→│ Debezium │─────────→│ Kafka │ ───→ Data Warehouse
│ (source) │ │(connector) │ │ topics │ ───→ Redis cache
└──────────┘ └───────────┘ └──────────┘ ───→ Microservice B
CDC guarantees:
- At-least-once delivery of every change
- Before/after state for UPDATE operations
- Transaction boundaries preserved (BEGIN/COMMIT)
- Schema change events (DDL)
Use cases:
✓ Search index sync (DB → Elasticsearch)
✓ Cache invalidation (DB → Redis)
✓ Analytics pipeline (OLTP → Data Warehouse)
✓ Microservice data sync (avoid distributed transactions)
✓ Audit logging (every change captured)
✓ Event sourcing on top of relational DB
Replication Decision Tree
Do you need multi-region writes (users in multiple regions write concurrently)?
├─ YES → Multi-Leader or Leaderless
│ ├─ Can you tolerate conflict resolution complexity?
│ │ ├─ YES, need geographic performance → Multi-Leader
│ │ └─ YES, need high availability → Leaderless (Cassandra, DynamoDB)
│ └─ NO → Consider single-leader with a primary region + async replicas
└─ NO → Single-Leader
├─ Need strong durability? → Semi-synchronous (one sync follower)
├─ Need maximum throughput? → Asynchronous
└─ Need strong consistency? → Synchronous (accept higher latency)
Key Numbers and Rules of Thumb
Replication lag in healthy datacenter: <100ms typical; <1s for same region
Replication lag cross-region: 50-150ms (proportional to speed of light distance)
Failover timeout typical: 30-60 seconds (leader considered dead after this)
Semi-synchronous cost: 1-3ms per write (waiting for one nearby follower)
Cassandra quorum most common: n=3, w=2, r=2 (LOCAL_QUORUM)
Rule: w + r > n for consistency; w + r ≤ n for availability over consistency
Red Flags
Using asynchronous replication for data where “success” means “durable”
Ignoring split-brain risk in leader failover (no fencing mechanism)
Using LWW for data that should never be lost (e.g., financial transactions, inventory)
Silently discarding concurrent writes without conflict resolution strategy
Using statement-based replication with non-deterministic functions (NOW(), RAND())
Setting w+r ≤ n and claiming strong consistency
No CDC for keeping derived data systems in sync (polling alternative is slower and less reliable)
Green Flags
Semi-synchronous replication for critical data with acceptable write latency
Fencing/STONITH in failover automation to prevent split-brain
Logical/row-based replication for zero-downtime upgrades + CDC capability
CRDT usage for counters and sets that naturally need distributed update
Version vectors to detect and resolve concurrent writes correctly
CDC pipeline (Debezium → Kafka) for search, analytics, and cache sync
Monotonic reads via sticky session routing per user