Chapter 6 Cheat Sheet - Partitioning

One-Line Summaries

ConceptOne-Liner
PartitioningSplitting data so each node holds a subset (sharding)
Key-range partitioningAdjacent keys stored together; enables range queries but risks hot spots
Hash partitioningHash key for uniform distribution; range queries become expensive
Hot spotOne partition receives disproportionate reads/writes
Consistent hashingRing-based; adding nodes only moves adjacent keys
Scatter/gatherQuery all partitions in parallel (for local secondary indexes)
RebalancingMoving partitions between nodes when topology changes
Request routingHow clients discover which node holds a given key

Partitioning Strategy Decision Tree

What are your access patterns?
│
├─ Need range queries (e.g., all events from Jan-Feb)?
│  ├─ Use key-range partitioning
│  └─ Watch out for: hot spots from sequential keys (e.g., timestamps)
│     Fix: Prepend random/varied prefix to key
│
├─ Need uniform write/read distribution, no range needed?
│  └─ Use hash partitioning (consistent hashing)
│
├─ Need both range and even distribution?
│  └─ Cassandra compound key:
│     Partition key (hashed) + Clustering key (sorted within partition)
│
└─ Have hot keys (celebrities, trending items)?
   └─ Application-level workaround:
      Append random suffix to key (splits writes, merges reads)

Key-Range vs Hash Partitioning

Key-Range Partitioning          Hash Partitioning
───────────────────────         ──────────────────────
[A-D] → Node 1                  hash(key) % N → Node
[E-M] → Node 2
[N-Z] → Node 3

✅ Range queries efficient       ✅ Uniform distribution
✅ Sorted data within partition  ✅ No hot spots from key patterns
❌ Hot spots from sequential keys ❌ Range queries require scatter/gather
❌ Manual split boundaries        ❌ Can't know adjacent keys

Secondary Index Partitioning

Document-Partitioned (Local):           Term-Partitioned (Global):
──────────────────────────────          ───────────────────────────
Partition 1:                            Index for "red":
  data: {car: red, ...}                   → partition 1, row 5
  local_idx: red → [row_5]               → partition 3, row 12
Partition 2:                            Index for "blue":
  data: {car: red, ...}                   → partition 2, row 7
  local_idx: red → [row_12]

WRITE: Update local partition only      WRITE: Update global index (cross-partition)
READ: Scatter to ALL partitions          READ: Lookup index partition only
      Gather results                     (but index may be slightly stale = eventual)

Used by: MongoDB, Elasticsearch         Used by: DynamoDB GSI, Riak Search

Rebalancing Strategies

1. Fixed number of partitions (Riak, Elasticsearch, Couchbase)
   Create 1000 partitions for 10 nodes (100/node)
   Add node: steal ~50 partitions from each existing node
   Con: partition count set at creation, hard to change

2. Dynamic partitioning (HBase, RethinkDB, MongoDB)
   Split: partition > 10GB → split into two 5GB partitions
   Merge: partition < threshold → merge with neighbor
   Pro: Automatic adaptation to data volume
   Con: Single node handles all data until first split

3. Proportional to nodes (Cassandra, Ketama)
   Each node gets fixed # of partitions (e.g., 256)
   Add node: new node randomly steals from existing nodes
   Pro: Scales with cluster size naturally

Consistent Hashing Ring

Hash space: 0 ─────────────────────── 2^64

Nodes placed at random positions:

              key_B (hash=25)
                ↓
    0    Node A (hash=10)   Node B (hash=45)   Node C (hash=80)    2^64
    ──────●─────────────────────●──────────────────●─────────────────●
    
key_B (hash=25) → goes to Node B (first node clockwise from 25)

Add Node D at position 35:
    key_B (hash=25) → now goes to Node D (nearest clockwise)
    Only keys between 10-35 are affected!
    All other keys unchanged → minimal rebalancing

Request Routing Approaches

Approach 1: Any node (gossip)         Approach 2: Routing tier
─────────────────────────────         ────────────────────────
Client → Node A                       Client → Router (ZooKeeper/etcd)
         ↓ (if not owner)                       ↓
         Node A → Node B              Router → Correct Node
         ↓                            
         Node A ← answer ← Node B    
         ↓
         Client ← response

Approach 3: Smart client
─────────────────────────
Client has partition map (from ZooKeeper subscription)
Client → Correct Node directly
Used by: Kafka consumers with topic metadata

Key Trade-offs

DecisionProConWhen to Use
Key-range partitionRange queries, sorted dataHot spots from sequential keysTime-series with composite key
Hash partitionUniform distributionNo range queriesRandom keys, uniform load
Local secondary indexSingle-partition writesScatter/gather readsWrite-heavy + range searches
Global secondary indexEfficient readsCross-partition writes, eventualRead-heavy secondary lookups
Dynamic partitioningAuto-adapts to data sizeSingle node starts all dataUnpredictable data growth
Manual rebalancingSafe, controlledOperational burdenStability-critical systems

Hot Spot Mitigation

Hot Key: user_id = "celebrity123" gets 1M reads/sec

Fix 1: Random suffix
  Write: key = "celebrity123_0" through "celebrity123_99"
  Read: query all 100 keys and combine → extra complexity

Fix 2: Caching layer
  Put Redis/Memcached in front
  Celebrity's data served from cache, not DB partition

Fix 3: Read replicas for hot partition
  Increase replication factor for the hot partition's node
  Reads served from multiple replicas

Fix 4: CDN for public data
  Celebrity's public posts → CDN → no DB reads at all for most traffic

Red Flags

❌ Using timestamps directly as partition keys (all writes go to today’s partition)
❌ No scatter/gather awareness in queries (undetected performance cliff)
❌ Automatic rebalancing during a failure incident (amplifies the outage)
❌ Ignoring hot spots — assuming hash partitioning magically solves all problems
❌ No partition-aware routing (sending all requests to random node = extra hops)

Green Flags

✅ Test your partitioning scheme with realistic access patterns
✅ Monitor partition sizes and per-partition throughput
✅ Use compound keys (hash + range) when you need both
✅ Global secondary indexes for read-heavy secondary access patterns
✅ Plan for rebalancing before you need it

Modern Additions (2026)

NewSQL auto-sharding:
├─ CockroachDB, Spanner, YugabyteDB
├─ Automatic range splitting and rebalancing
└─ Transparent to application — no shard key design

Kafka KRaft:
├─ Partition metadata in Kafka itself (no ZooKeeper)
├─ KRaft protocol = Raft-based metadata management
└─ Simpler operations, faster partition reassignment

Serverless partitioning:
├─ DynamoDB on-demand: auto-partition scaling with traffic
├─ Neon PostgreSQL: partition at storage layer
└─ No manual sharding decisions needed

Interview Response Templates

When Asked About Sharding Strategy

“I’d first ask what the access patterns are. For sequential writes (like logs or time-series), key-range partitioning risks hot spots — I’d add a prefix (sensor ID, user ID) to distribute across ranges. For random access without range needs, hash partitioning with consistent hashing gives even distribution. If I need both, a compound key (Cassandra-style) gives hash distribution with range within each partition.”

When Asked About Secondary Index Partitioning Trade-off

“Local secondary indexes mean writes touch only one partition (fast) but reads require scatter/gather across all partitions (slow, high tail latency). Global secondary indexes make reads efficient but writes cross partitions and the index may be slightly stale. For a write-heavy system with occasional secondary searches, I’d use local indexes. For a read-heavy system where secondary lookups are critical, global indexes are worth the write cost.”


Quick Revision Time: 5 minutes
Interview Prep: 15 minutes
Last Updated: 2026-04-13