Chapter 7 Cheat Sheet — Sharding

Quick Revision Time: 5 minutes | Interview Prep: 15 minutes

One-Line Summaries

Concept	One-Liner
Sharding	Split data across nodes; each node holds a subset
Key-range sharding	Adjacent keys co-located; enables range queries; risks sequential hot spots
Hash sharding	Hash key for uniform spread; range queries require scatter/gather
Consistent hashing	Ring-based; adding nodes only moves adjacent keys (~1/N of data)
Hot spot	One shard gets disproportionate load (celebrity posts, viral content)
Local secondary index	Per-shard index; fast writes, scatter/gather reads
Global secondary index	Single term-partitioned index; fast reads, async/eventual updates
Scatter/gather	Fan out query to ALL shards; tail latency = slowest shard
Rebalancing	Moving shards between nodes as topology changes
Multitenancy sharding	Shard key includes tenant_id; row/schema/database-level isolation

Sharding Strategy Decision Tree

What are your access patterns?
│
├─ Need range queries (all events Jan-Feb, all sensors in region X)?
│  ├─ Use key-range sharding
│  └─ Watch for: sequential key hot spots (timestamps → prefix with entity ID)
│
├─ Need uniform distribution, no range queries?
│  └─ Use hash sharding (consistent hashing for cluster changes)
│
├─ Need BOTH range and uniform distribution?
│  └─ Compound key: hash(partition_key) + sorted clustering_key
│     Example: (user_id hashed) + (timestamp sorted within shard)
│
├─ SaaS / multi-tenant platform?
│  ├─ Small tenants → row-level isolation, tenant_id as shard key prefix
│  ├─ Large tenants → dedicated shards or dedicated DB
│  └─ Mix → tiered: shared pool + reserved shards for top N tenants
│
└─ Celebrity / viral content hot spot?
   ├─ Write hot spot → random suffix (100 variants per key)
   ├─ Read hot spot → cache layer (Redis/CDN in front)
   └─ Both → write suffix + read-aside cache

Key-Range vs Hash Sharding

Key-Range Sharding                     Hash Sharding
───────────────────────────────        ─────────────────────────────
[key_001─key_500] → Shard 0            hash(key) → bucket → Shard N
[key_501─key_999] → Shard 1
[key_1000─key_1999] → Shard 2

✅ Range queries span one shard         ✅ Uniform distribution
✅ Sorted data → prefix scans work      ✅ No hot spots from sequential keys
✅ Co-locate related data (same user)   ✅ Simple routing logic
❌ Sequential key hot spots             ❌ Range queries = scatter/gather
❌ Uneven shard sizes possible          ❌ Cannot infer adjacent keys
❌ Manual boundary management           ❌ Skewed access still causes hot spots

Consistent Hashing Ring

Hash space: 0 ──────────────────────────── 2^64

Nodes at random ring positions:

      key_X (hash=25)
           ↓
0  [A@10]  •──────  [B@45]  ────────  [C@80]  ──── 2^64
   
key_X (hash=25) → Node B (first clockwise from 25)

Add Node D at position 35:
0  [A@10]  [D@35]key_X [B@45]       [C@80]  ──── 2^64

key_X now routes to D — only keys in (10,35] need to move!

Multitenancy Isolation Models

┌─────────────────────┬───────────────────────┬────────────────────────┐
│ Row-Level           │ Schema-Level           │ Database-Level         │
├─────────────────────┼───────────────────────┼────────────────────────┤
│ All tenants share   │ Each tenant has its    │ Each tenant has a      │
│ tables; tenant_id   │ own schema in shared   │ dedicated database     │
│ column distinguishes│ database instance      │ (or cluster)           │
├─────────────────────┼───────────────────────┼────────────────────────┤
│ ✅ Low overhead      │ ✅ Moderate isolation  │ ✅ Full isolation       │
│ ✅ Simple schema     │ ✅ Per-tenant backups   │ ✅ Easy compliance      │
│ ❌ Noisy neighbors   │ ❌ Schema count limits  │ ❌ High ops overhead    │
│ ❌ Bug = data leak   │ ❌ Harder aggregation   │ ❌ Hard to aggregate    │
├─────────────────────┼───────────────────────┼────────────────────────┤
│ Shared SaaS apps    │ Shopify, mid-tier SaaS │ Enterprise contracts   │
└─────────────────────┴───────────────────────┴────────────────────────┘

Local vs Global Secondary Indexes

LOCAL (Document-Partitioned):              GLOBAL (Term-Partitioned):
────────────────────────────────           ──────────────────────────────────
Shard 0: data + idx for its rows           Index shard "red": ptrs to all shards
Shard 1: data + idx for its rows           Index shard "blue": ptrs to all shards
Shard 2: data + idx for its rows

WRITE: update 1 shard → fast               WRITE: update 2 shards → slower
READ:  scatter to ALL shards → slow        READ: lookup 1 index shard → fast
       tail latency problem                      (may be slightly stale)

Best: write-heavy + tolerates read cost    Best: read-heavy secondary access
Used: MongoDB, Elasticsearch, Cassandra    Used: DynamoDB GSI, Riak Search

Rebalancing Strategies

1. Fixed shard count (Elasticsearch, Riak, Couchbase)
   ───────────────────────────────────────────────────
   Create 1000 shards for 10 nodes → 100/node
   Add node: steal ~91 shards, no splitting needed
   Con: shard count set at creation; hard to change later

2. Dynamic splitting (HBase, MongoDB, RethinkDB)
   ───────────────────────────────────────────────────
   Shard > 10 GB → split into two 5 GB shards
   Shard < 1 GB  → merge with neighbor
   Pro: auto-adapts to data volume
   Con: cold-start problem (single shard handles everything at launch)

3. Vnodes / proportional (Cassandra)
   ───────────────────────────────────────────────────
   Each node owns N virtual nodes (default: 256)
   Add node: randomly steal ~256/(total_nodes+1) vnodes each from existing nodes
   Pro: natural scaling; finer granularity with more nodes
   Con: gossip overhead grows with vnode count

Request Routing Approaches

Approach 1: Any-node + forward         Approach 2: Routing tier
──────────────────────────────         ──────────────────────────────
Client → Node A (random)               Client → ZooKeeper-backed Router
  Node A: "I don't own K"                          ↓ (knows shard map)
  Node A → Node B (owns K)              Correct Shard Node
  Node B → Node A → Client                         ↓
                                        Result → Client
Extra hop in data path                  Extra hop before data path
Used: Cassandra (gossip)               Used: Kafka, some MongoDB

Approach 3: Smart client
────────────────────────────────────────
Client subscribes to ZooKeeper/etcd for shard map
Client → Correct Node directly (no extra hops)
Lowest latency. Used: Kafka producer/consumer API, Cassandra drivers

Hot Spot Relief Reference

Scenario	Problem	Fix	Trade-off
Timestamp shard key	All writes to “now” shard	Prefix with entity ID (sensor_id, user_id)	Key design change required
Celebrity post reads	Millions of reads to one shard	Redis/CDN cache in front	Cache invalidation complexity
Viral write burst	Write spike to single shard	Random suffix (100 variants)	Reads must aggregate 100 keys
Large tenant in shared pool	One tenant saturates shared shard	Dedicated shard for top-N tenants	Tenant routing layer needed

Auto vs Manual Rebalancing

DANGER SCENARIO — Auto rebalancing during failure:
──────────────────────────────────────────────────
1. Node C slows down (GC pause, network blip)
2. Auto-rebalancer: "Node C seems down, move its shards!"
3. Heavy data transfer begins on already-stressed Node A and Node B
4. A and B, now overloaded, also slow down
5. Rebalancer: "A and B are slow too!" → triggers more moves
6. Cascade failure: cluster completely down

PREVENTION:
- Add delay threshold before triggering rebalancing
- Require quorum of nodes to agree a node is truly down
- Require human operator approval for large rebalancing operations
- Separate rebalancing monitoring from regular load alerts

Key Trade-offs Table

Decision	Pros	Cons	Use When
Key-range sharding	Range queries, co-location	Sequential hot spots	Time-series + compound prefix key
Hash sharding	Uniform distribution, no sequential hot spots	No range queries	Random-access workloads
Local secondary index	Fast writes (single shard)	Scatter/gather reads	Write-heavy + occasional secondary search
Global secondary index	Fast reads (single index shard)	Eventual consistency, write cost	Read-heavy secondary access
Auto rebalancing	No operator intervention	Dangerous during failures	Dev/test, managed cloud services
Manual rebalancing	Safe, controlled timing	Operator burden	Production SLA-critical clusters
Row-level tenancy	Low overhead, simple	Noisy neighbor, leak risk	Small-scale SaaS, homogeneous tenants
DB-level tenancy	Full isolation, compliance easy	High overhead	Enterprise contracts, regulated data

Modern Additions (2026)

Serverless auto-sharding:
├─ DynamoDB on-demand: shards scale automatically; no shard key tuning
├─ PlanetScale (Vitess): MySQL sharding transparent via VTGate
└─ Neon PostgreSQL: storage-layer sharding; app sees single DB

NewSQL transparent sharding:
├─ CockroachDB: auto range splits + rebalancing + Raft per range
├─ Google Spanner: TrueTime + auto shards + global ACID transactions
└─ YugabyteDB: PostgreSQL-compatible, auto Raft-based sharding

Kafka KRaft (no ZooKeeper):
├─ Partition metadata stored in Kafka's own Raft log
├─ Simpler operations, faster partition reassignment
└─ One fewer system to operate and monitor

Vector DB sharding:
├─ Pinecone, Weaviate, Milvus: shard by cluster assignment (k-means)
├─ Approximate nearest neighbor requires replicated indexes per shard
└─ Not key-range or hash; purpose-built for high-dimensional space

Interview Response Templates

“I’d first analyze access patterns: user profile reads are by user_id (hash shard on user_id for uniform distribution), but feed queries are time-range-based (compound key: hash user_id, range by timestamp within shard). For the celebrity hot-spot problem, I’d use a caching layer (Redis) for public profiles and posts of high-follower accounts, backed by CDN for truly public content. For secondary indexes like ‘all posts with hashtag X’, I’d use global term-partitioned indexes for read efficiency, accepting slight eventual consistency."

"Local vs global secondary index — when do you use each?”

“Local secondary indexes are write-cheap (single shard) but read-expensive (scatter/gather = tail latency amplification). I’d use them when writes are frequent and secondary reads are rare or latency-tolerant. Global secondary indexes reverse this: writes touch two shards and the index is eventually consistent, but secondary reads hit one partition directly. I’d use them for read-heavy secondary access patterns where index freshness can be slightly stale — like product catalog searches, not financial ledger reads.”

Last Updated: 2026-05-29

Study Notes by Niladri & AI

Explorer

ch07-cheatsheet

Chapter 7 Cheat Sheet — Sharding

One-Line Summaries

Sharding Strategy Decision Tree

Key-Range vs Hash Sharding

Consistent Hashing Ring

Multitenancy Isolation Models

Local vs Global Secondary Indexes

Rebalancing Strategies

Request Routing Approaches

Hot Spot Relief Reference

Auto vs Manual Rebalancing

Key Trade-offs Table

Modern Additions (2026)

Interview Response Templates

"Local vs global secondary index — when do you use each?”

Graph View

Table of Contents

Study Notes by Niladri & AI

Explorer

ch07-cheatsheet

Chapter 7 Cheat Sheet — Sharding

One-Line Summaries

Sharding Strategy Decision Tree

Key-Range vs Hash Sharding

Consistent Hashing Ring

Multitenancy Isolation Models

Local vs Global Secondary Indexes

Rebalancing Strategies

Request Routing Approaches

Hot Spot Relief Reference

Auto vs Manual Rebalancing

Key Trade-offs Table

Modern Additions (2026)

Interview Response Templates

”How would you shard a social media platform?”

"Local vs global secondary index — when do you use each?”

Graph View

Table of Contents