Scalability Patterns: Cross-Chapter Comparison Reference

comparisons scalability patterns architecture

Type: Cross-chapter reference — NOT a chapter notes file
Covers: Vol 1 Ch01, Ch04, Ch05, Ch08, Ch11, Ch12, Ch14, Ch15 · Vol 2 Ch01, Ch05, Ch06, Ch09, Ch10, Ch13
Last Updated: 2026-04-13


1. Horizontal vs Vertical Scaling

AspectVertical (Scale Up)Horizontal (Scale Out)
MechanismReplace with a bigger machine (more CPU, RAM, disk)Add more machines to the pool
LimitHard ceiling (largest instance: ~96 CPU, ~384 GB RAM on AWS)Virtually unlimited (add nodes)
Failure modeSingle point of failure — one machine goes down, system is downCan lose individual nodes without outage
ConsistencySimple — no distributed state problemComplex — must handle distributed state, eventual consistency
CostExpensive per unit at large scaleCheaper per unit, pay-as-you-grow
Speed of scalingFast (resize existing machine)Slow (provision, configure, load balance new nodes)
When to use firstAlways — exhaust vertical before horizontalWhen vertical ceiling hit or cost prohibitive

SDI progression (Vol 1 Ch01 — Scale from Zero to Millions):

  • 0 → 1K users: Single server, vertical scale
  • 1K → 100K users: Add replica DB, CDN, basic load balancer
  • 100K → 1M users: Multiple app servers (horizontal), sharded DB
  • 1M+ users: Full microservices, consistent hashing, multi-region

Real examples from SDI chapters:

  • URL shortener (Vol 1 Ch08): Start with single MySQL; at 100M URLs/day → Redis cache absorbs reads, vertical scale DB, add replicas
  • Chat system (Vol 1 Ch12): WebSocket servers scale horizontally; each server handles ~N concurrent connections; load balancer distributes
  • Metrics monitoring (Vol 2 Ch05): Ingestion layer scales horizontally (stateless collectors); time-series DB scales via sharding

2. Load Balancing Strategies

AlgorithmHow it worksProsConsWhen to useSDI examples
Round RobinRequests distributed sequentially across serversSimple, fair for uniform requestsIgnores server load; slow servers pile upStateless API servers with similar request sizesDefault for most app server pools
Weighted Round RobinServers get proportional traffic based on weightHandles heterogeneous hardwareWeights need manual tuningMixed server capacitiesWhen some servers are more powerful
Least ConnectionsSend to server with fewest active connectionsBetter for long-lived connectionsOverhead tracking connection countWebSocket servers, long-poll connectionsChat system (Ch12) — persistent connections
Least Response TimeSend to server with lowest current latencyBest real-time performanceComplex measurementLatency-sensitive servicesStock exchange market data (V2 Ch13)
IP HashHash client IP to a consistent serverSticky sessions (same client → same server)Uneven if clients cluster by IP; fails if server removedSession affinity without shared session storeLegacy apps with server-side sessions
Consistent HashingHash ring; client/key → nearest server on ringMinimal remapping when servers added/removedNeed virtual nodes for balanceDistributed caches, sharded data servicesRedis Cluster, Cassandra routing

Layer 4 vs Layer 7 load balancers:

  • L4 (TCP/UDP): Routes by IP + port only. Faster, lower overhead. Can’t inspect HTTP content. Good for raw throughput.
  • L7 (HTTP): Routes by URL path, headers, cookies. Can do content-based routing (/api → API servers, /static → CDN). Good for microservices.

3. Data Partitioning Patterns

StrategyLogicProsConsSDI chapters using it
Range-basedAssign key ranges to shards (A–F → shard 1)Easy range queries; natural orderingHotspots when data is skewedWeb crawler (Ch09) — crawl by URL prefix; Time-series by time range (V2 Ch05)
Hash-basedshard = hash(key) % NEven distribution; simpleNo range queries; full reshard when N changesURL shortener (Ch08) — hash of shortCode; chat messages by conversation ID
Consistent hashingHash ring; each node owns an arc; add/remove moves minimal dataMinimal resharding disruption; handles node churn gracefullyNeed virtual nodes to avoid uneven distribution; more complex routingKey-value store (Ch06); Redis Cluster; Cassandra; distributed cache (Vol 2 Ch04)
Geographic / directoryRoute based on user location or explicit lookup tableLow latency; data residency complianceCross-region queries complex; uneven shard sizesGoogle Maps (V2 Ch03) — tile shards by lat/lon quadrant; Hotel reservation (V2 Ch07) — by region

Resharding problem: The main reason consistent hashing exists. When you add shard N+1 with simple hash modulo, all keys remapping costs enormous I/O. With consistent hashing, only 1/N of keys move.

Celebrity problem (hotspot): A single partition receives disproportionate traffic (one celebrity’s posts, one trending URL). Solutions:

  • Add suffix/prefix to shard key to spread across shards (user:celebrity:shard_0, user:celebrity:shard_1)
  • Cache hot keys in every app server’s local memory (1–5s TTL)
  • Read replicas for the hot partition

4. Replication Patterns

PatternWritesReadsConsistencyFailure handlingSDI chapters
Leader-Follower (Primary-Replica)Primary onlyPrimary + any replicaEventual (replication lag)Promote replica on primary failureMost SQL DBs (MySQL replicas in URL shortener, news feed, notifications)
Multi-Leader (Active-Active)Any nodeAny nodeConflict resolution required (last-write-wins, CRDTs, custom)Any node can take over; write conflicts possibleGoogle Docs-style collaborative editing; Multi-region active-active (Google Drive conflicts)
Leaderless (Quorum)All nodes accept writes; quorum W of N confirmQuorum R of N nodes respond; merge conflictsTunable: W + R > N = strong; W + R ≤ N = eventualNo leader election needed; Sloppy quorum during partitionsCassandra (chat Ch12, metrics V2 Ch05); DynamoDB; Key-value store (Ch06 — Dynamo-style)

Trade-offs table:

AvailabilityConsistencyComplexityBest for
Leader-FollowerMedium (failover needed)Strong (sync) or Eventual (async)LowMost OLTP systems
Multi-LeaderHigh (no single failure point)Low (conflict-prone)HighMulti-region writes, collaboration
LeaderlessHigh (no leader election)Tunable via quorumMediumHigh write availability, Cassandra-style

Quorum formula (from Vol 1 Ch06 — Key-Value Store):

  • N = replication factor, W = write quorum, R = read quorum
  • W + R > N → strong consistency (reads always see latest write)
  • Common: N=3, W=2, R=2 (majority quorum)
  • Latency optimized: W=1, R=3 (fast writes, slow reads)
  • Durability optimized: W=3, R=1 (all nodes confirm write)

5. Fanout Patterns

Fanout = distributing a single event to multiple destinations. The core trade-off is: do work at write time or read time?

PatternWhen work is doneLatency for writerLatency for readerStorage costSDI chapter
Fanout on Write (Push model)At post/event creation timeHigh (must write to all N followers)Low (pre-built feed)High (N copies per post)Vol 1 Ch11 News Feed — write post → push to all followers’ feed caches
Fanout on Read (Pull model)At feed retrieval timeLow (just store post)High (must query all N followees)Low (one copy)Vol 1 Ch11 — older Twitter approach; simpler but slow for high-fan users
HybridWrite for regular users; read for celebritiesMediumMediumMediumVol 1 Ch11 — preferred production approach; push to non-celebrity followers, pull for celebrities on read

When to use which:

  • Write fanout: Best when followers are few (<500), feed access is frequent, read latency must be low
  • Read fanout: Best when user has millions of followers (celebrities), or feed is rarely accessed
  • Hybrid: Real production choice at scale — write fanout for normal users, read fanout (or separate celebrity handling) for high-follower accounts

Generalized fanout beyond news feed:

  • Notification system (Vol 1 Ch10): Fanout on write to per-channel queues (one queue per notification type)
  • YouTube upload (Vol 1 Ch14): Fanout on write to transcoding jobs (one Kafka event → N parallel workers)
  • Google Drive sync (Vol 1 Ch15): Fanout to all of user’s devices on file change

6. Rate Limiting Patterns

Summary reference — full detail in Vol 1 Ch04.

AlgorithmMemoryAccuracyAllows burstComplexityProduction use
Token BucketLow (bucket state per user)GoodYes (consume all tokens at once)LowAWS API Gateway, Stripe — industry default
Leaky BucketLow (queue size per user)GoodNo (constant drain rate)MediumTraffic shaping, smooth downstream load
Fixed Window CounterVery low (one counter per window)Poor (burst at boundary)PartialVery lowPrototypes only
Sliding Window LogHigh (all timestamps)ExcellentNoMediumLow-volume, accuracy-critical
Sliding Window CounterLow (two counters)Very good (weighted approximation)NoMediumCloudflare, Azure — best production balance

Where rate limiting lives in the architecture:

  • API Gateway: Per-user, per-IP, per-endpoint limits
  • Redis: Stores counters (atomic INCR, automatic TTL expiry)
  • Distributed: Redis Cluster shards counters by user ID; race conditions solved with Lua scripts

7. CDN Patterns

AspectPush CDNPull CDN
How content gets to edgeYou upload/push content to CDN nodes in advanceCDN fetches from origin on first cache miss, then caches
First requestFast (already at edge)Slow (origin fetch + cache)
Content freshnessManual invalidation or re-pushTTL-based auto-expiry
Best forRarely-changing content you know will be popular (software releases, JS bundles)Frequently-changing or unpredictable content
Storage costHigh (pre-populate all edges)Lower (only cache what’s requested)
ComplexityHigher (you manage distribution)Lower (CDN manages it)

What to put on CDN:

  • Static assets: JS, CSS, fonts, images (always)
  • Video segments: HLS/DASH chunks (YouTube Ch14, Netflix-style)
  • Map tiles: Pre-rendered PNG/vector tiles (Google Maps V2 Ch03)
  • File downloads: User files from Google Drive (Vol 1 Ch15)
  • API responses: Only if safe to cache (no user-specific data without cache-key on user ID)

What NOT to put on CDN:

  • Authenticated API responses (unless you include auth token in cache key)
  • Real-time data (order book, live prices)
  • User-specific personalized content (news feed, notifications)

SDI chapters using CDN heavily:

  • YouTube (Vol 1 Ch14): CDN is mandatory — video at 5M DAU without CDN = origin server melts down. CDN serves ~95% of video traffic.
  • Google Drive (Vol 1 Ch15): Pull CDN for file downloads; cached at edge per file version.
  • Google Maps (Vol 2 Ch03): Map tiles are static; Push CDN for popular zoom levels/regions.
  • URL shortener (Vol 1 Ch08): Can cache redirect responses at CDN edge (301 cached by browser + CDN).

8. Microservices vs Monolith

FactorMonolithMicroservices
Team size< 10–20 engineers50+ engineers
DeploymentOne deployable unitIndependent per service
ScalingScale the whole appScale specific bottleneck services
TechnologySingle stackPolyglot (best tool per service)
DataShared single DBDB per service (no shared DB!)
LatencyIn-process calls (sub-ms)Network calls (1–10ms per hop)
DebuggingSimple stack tracesRequires distributed tracing
When to startAlways (premature microservices = pain)When monolith becomes deployment bottleneck

API Gateway pattern (mandatory for microservices):

  • Single entry point for all client requests
  • Handles auth, rate limiting, request routing, protocol translation
  • Without it, clients must know about every service
  • SDI reference: Every chapter with microservices uses an API Gateway

Service mesh (optional, for large-scale microservices):

  • Sidecars (Envoy) on every pod handle: retries, circuit breaking, mTLS, observability
  • Use when > ~20 services or when network policy / tracing is critical

9. Scalability Decision Framework

Given a set of constraints, which patterns apply?

Constraint / SymptomPattern to applySDI example
Read traffic >> write traffic (>5:1)Read replicas + caching (Redis)URL shortener, news feed, leaderboard
Write throughput > single DB capacitySharding (consistent hash or range)Chat messages, metrics ingestion
Single server is SPOFHorizontal scale + load balancerAll production systems
Global users with latency concernsCDN + GeoDNS + multi-regionYouTube, Google Maps, Google Drive
Traffic is spiky / burstyMessage queue as bufferNotification system, video transcoding
Real-time fan-out to many subscribersPub/Sub or Kafka topicsNews feed fanout, notification fan-out
Hotspot / celebrity problemHybrid fanout + local cache + key splittingNews feed celebrity handling
Need to replay eventsKafka with retentionAd click aggregation, metrics backfill
Near-millisecond response timeIn-memory store (Redis) + connection poolingLeaderboard, rate limiter, order book
Ultra-low latency (<1ms)Co-location, LMAX Disruptor, in-process queueStock exchange matching engine
Data consistency across servicesOutbox pattern + SagaPayment system, hotel reservation
Schema flexibility neededNoSQL (document DB or wide-column)User profiles, activity events

10. Numbers Every Engineer Should Know for Scalability

Latency Table (approximate, 2024)

OperationLatency
L1 cache read0.5 ns
L2 cache read7 ns
Main memory (RAM) read100 ns
Redis GET (local network)~0.5–1 ms
SSD random read~0.1 ms
Network round-trip (same datacenter)~0.5 ms
Network round-trip (cross-region, US East ↔ West)~70 ms
Network round-trip (US ↔ Europe)~100–120 ms
HDD seek~10 ms
MySQL query (simple, indexed)~1–5 ms
MySQL query (complex join)~10–100 ms
Cassandra read (quorum)~1–5 ms
S3 GET request~10–50 ms
DNS lookup~1–100 ms

Throughput Table

ComponentTypical throughput
Single web server (nginx)~50K–100K req/sec
Single app server (Java Spring)~1K–10K req/sec (CPU bound)
Redis (single node)~100K–1M ops/sec
MySQL (reads, indexed)~10K–100K QPS per server
MySQL (writes)~1K–10K writes/sec
Cassandra (writes)~50K–100K writes/sec per node
Kafka (single partition)~10 MB/sec or ~100K msgs/sec
CDN (global, Cloudflare)Terabits/sec aggregate
AWS S3Thousands of req/sec per prefix

Storage Scale Reference

ScaleUsersStorage need / yearTypical architecture
Startup< 10K< 100 GBSingle server + single DB
Growing10K–1M100 GB – 10 TBLoad balancer + DB replicas + CDN
Large1M–100M10 TB – 1 PBSharded DB + distributed cache + object storage
FAANG100M+1 PB+Custom storage systems + multi-region

Useful multiples:

  • 1 million seconds ≈ 11.5 days
  • 2.5 million seconds ≈ 1 month
  • 1 billion seconds ≈ 31.7 years
  • QPS = DAU × actions_per_day / 86,400
  • Peak QPS ≈ 2–3× average QPS
  • Storage = daily_writes × size_per_entry × retention_days × replication_factor

See also: