Scalability Patterns: Cross-Chapter Comparison Reference
comparisons scalability patterns architecture
Type: Cross-chapter reference — NOT a chapter notes file
Covers: Vol 1 Ch01, Ch04, Ch05, Ch08, Ch11, Ch12, Ch14, Ch15 · Vol 2 Ch01, Ch05, Ch06, Ch09, Ch10, Ch13
Last Updated: 2026-04-13
1. Horizontal vs Vertical Scaling
| Aspect | Vertical (Scale Up) | Horizontal (Scale Out) |
|---|---|---|
| Mechanism | Replace with a bigger machine (more CPU, RAM, disk) | Add more machines to the pool |
| Limit | Hard ceiling (largest instance: ~96 CPU, ~384 GB RAM on AWS) | Virtually unlimited (add nodes) |
| Failure mode | Single point of failure — one machine goes down, system is down | Can lose individual nodes without outage |
| Consistency | Simple — no distributed state problem | Complex — must handle distributed state, eventual consistency |
| Cost | Expensive per unit at large scale | Cheaper per unit, pay-as-you-grow |
| Speed of scaling | Fast (resize existing machine) | Slow (provision, configure, load balance new nodes) |
| When to use first | Always — exhaust vertical before horizontal | When vertical ceiling hit or cost prohibitive |
SDI progression (Vol 1 Ch01 — Scale from Zero to Millions):
- 0 → 1K users: Single server, vertical scale
- 1K → 100K users: Add replica DB, CDN, basic load balancer
- 100K → 1M users: Multiple app servers (horizontal), sharded DB
- 1M+ users: Full microservices, consistent hashing, multi-region
Real examples from SDI chapters:
- URL shortener (Vol 1 Ch08): Start with single MySQL; at 100M URLs/day → Redis cache absorbs reads, vertical scale DB, add replicas
- Chat system (Vol 1 Ch12): WebSocket servers scale horizontally; each server handles ~N concurrent connections; load balancer distributes
- Metrics monitoring (Vol 2 Ch05): Ingestion layer scales horizontally (stateless collectors); time-series DB scales via sharding
2. Load Balancing Strategies
| Algorithm | How it works | Pros | Cons | When to use | SDI examples |
|---|---|---|---|---|---|
| Round Robin | Requests distributed sequentially across servers | Simple, fair for uniform requests | Ignores server load; slow servers pile up | Stateless API servers with similar request sizes | Default for most app server pools |
| Weighted Round Robin | Servers get proportional traffic based on weight | Handles heterogeneous hardware | Weights need manual tuning | Mixed server capacities | When some servers are more powerful |
| Least Connections | Send to server with fewest active connections | Better for long-lived connections | Overhead tracking connection count | WebSocket servers, long-poll connections | Chat system (Ch12) — persistent connections |
| Least Response Time | Send to server with lowest current latency | Best real-time performance | Complex measurement | Latency-sensitive services | Stock exchange market data (V2 Ch13) |
| IP Hash | Hash client IP to a consistent server | Sticky sessions (same client → same server) | Uneven if clients cluster by IP; fails if server removed | Session affinity without shared session store | Legacy apps with server-side sessions |
| Consistent Hashing | Hash ring; client/key → nearest server on ring | Minimal remapping when servers added/removed | Need virtual nodes for balance | Distributed caches, sharded data services | Redis Cluster, Cassandra routing |
Layer 4 vs Layer 7 load balancers:
- L4 (TCP/UDP): Routes by IP + port only. Faster, lower overhead. Can’t inspect HTTP content. Good for raw throughput.
- L7 (HTTP): Routes by URL path, headers, cookies. Can do content-based routing (
/api→ API servers,/static→ CDN). Good for microservices.
3. Data Partitioning Patterns
| Strategy | Logic | Pros | Cons | SDI chapters using it |
|---|---|---|---|---|
| Range-based | Assign key ranges to shards (A–F → shard 1) | Easy range queries; natural ordering | Hotspots when data is skewed | Web crawler (Ch09) — crawl by URL prefix; Time-series by time range (V2 Ch05) |
| Hash-based | shard = hash(key) % N | Even distribution; simple | No range queries; full reshard when N changes | URL shortener (Ch08) — hash of shortCode; chat messages by conversation ID |
| Consistent hashing | Hash ring; each node owns an arc; add/remove moves minimal data | Minimal resharding disruption; handles node churn gracefully | Need virtual nodes to avoid uneven distribution; more complex routing | Key-value store (Ch06); Redis Cluster; Cassandra; distributed cache (Vol 2 Ch04) |
| Geographic / directory | Route based on user location or explicit lookup table | Low latency; data residency compliance | Cross-region queries complex; uneven shard sizes | Google Maps (V2 Ch03) — tile shards by lat/lon quadrant; Hotel reservation (V2 Ch07) — by region |
Resharding problem: The main reason consistent hashing exists. When you add shard N+1 with simple hash modulo, all keys remapping costs enormous I/O. With consistent hashing, only 1/N of keys move.
Celebrity problem (hotspot): A single partition receives disproportionate traffic (one celebrity’s posts, one trending URL). Solutions:
- Add suffix/prefix to shard key to spread across shards (
user:celebrity:shard_0,user:celebrity:shard_1) - Cache hot keys in every app server’s local memory (1–5s TTL)
- Read replicas for the hot partition
4. Replication Patterns
| Pattern | Writes | Reads | Consistency | Failure handling | SDI chapters |
|---|---|---|---|---|---|
| Leader-Follower (Primary-Replica) | Primary only | Primary + any replica | Eventual (replication lag) | Promote replica on primary failure | Most SQL DBs (MySQL replicas in URL shortener, news feed, notifications) |
| Multi-Leader (Active-Active) | Any node | Any node | Conflict resolution required (last-write-wins, CRDTs, custom) | Any node can take over; write conflicts possible | Google Docs-style collaborative editing; Multi-region active-active (Google Drive conflicts) |
| Leaderless (Quorum) | All nodes accept writes; quorum W of N confirm | Quorum R of N nodes respond; merge conflicts | Tunable: W + R > N = strong; W + R ≤ N = eventual | No leader election needed; Sloppy quorum during partitions | Cassandra (chat Ch12, metrics V2 Ch05); DynamoDB; Key-value store (Ch06 — Dynamo-style) |
Trade-offs table:
| Availability | Consistency | Complexity | Best for | |
|---|---|---|---|---|
| Leader-Follower | Medium (failover needed) | Strong (sync) or Eventual (async) | Low | Most OLTP systems |
| Multi-Leader | High (no single failure point) | Low (conflict-prone) | High | Multi-region writes, collaboration |
| Leaderless | High (no leader election) | Tunable via quorum | Medium | High write availability, Cassandra-style |
Quorum formula (from Vol 1 Ch06 — Key-Value Store):
- N = replication factor, W = write quorum, R = read quorum
W + R > N→ strong consistency (reads always see latest write)- Common: N=3, W=2, R=2 (majority quorum)
- Latency optimized: W=1, R=3 (fast writes, slow reads)
- Durability optimized: W=3, R=1 (all nodes confirm write)
5. Fanout Patterns
Fanout = distributing a single event to multiple destinations. The core trade-off is: do work at write time or read time?
| Pattern | When work is done | Latency for writer | Latency for reader | Storage cost | SDI chapter |
|---|---|---|---|---|---|
| Fanout on Write (Push model) | At post/event creation time | High (must write to all N followers) | Low (pre-built feed) | High (N copies per post) | Vol 1 Ch11 News Feed — write post → push to all followers’ feed caches |
| Fanout on Read (Pull model) | At feed retrieval time | Low (just store post) | High (must query all N followees) | Low (one copy) | Vol 1 Ch11 — older Twitter approach; simpler but slow for high-fan users |
| Hybrid | Write for regular users; read for celebrities | Medium | Medium | Medium | Vol 1 Ch11 — preferred production approach; push to non-celebrity followers, pull for celebrities on read |
When to use which:
- Write fanout: Best when followers are few (<500), feed access is frequent, read latency must be low
- Read fanout: Best when user has millions of followers (celebrities), or feed is rarely accessed
- Hybrid: Real production choice at scale — write fanout for normal users, read fanout (or separate celebrity handling) for high-follower accounts
Generalized fanout beyond news feed:
- Notification system (Vol 1 Ch10): Fanout on write to per-channel queues (one queue per notification type)
- YouTube upload (Vol 1 Ch14): Fanout on write to transcoding jobs (one Kafka event → N parallel workers)
- Google Drive sync (Vol 1 Ch15): Fanout to all of user’s devices on file change
6. Rate Limiting Patterns
Summary reference — full detail in Vol 1 Ch04.
| Algorithm | Memory | Accuracy | Allows burst | Complexity | Production use |
|---|---|---|---|---|---|
| Token Bucket | Low (bucket state per user) | Good | Yes (consume all tokens at once) | Low | AWS API Gateway, Stripe — industry default |
| Leaky Bucket | Low (queue size per user) | Good | No (constant drain rate) | Medium | Traffic shaping, smooth downstream load |
| Fixed Window Counter | Very low (one counter per window) | Poor (burst at boundary) | Partial | Very low | Prototypes only |
| Sliding Window Log | High (all timestamps) | Excellent | No | Medium | Low-volume, accuracy-critical |
| Sliding Window Counter | Low (two counters) | Very good (weighted approximation) | No | Medium | Cloudflare, Azure — best production balance |
Where rate limiting lives in the architecture:
- API Gateway: Per-user, per-IP, per-endpoint limits
- Redis: Stores counters (atomic INCR, automatic TTL expiry)
- Distributed: Redis Cluster shards counters by user ID; race conditions solved with Lua scripts
7. CDN Patterns
| Aspect | Push CDN | Pull CDN |
|---|---|---|
| How content gets to edge | You upload/push content to CDN nodes in advance | CDN fetches from origin on first cache miss, then caches |
| First request | Fast (already at edge) | Slow (origin fetch + cache) |
| Content freshness | Manual invalidation or re-push | TTL-based auto-expiry |
| Best for | Rarely-changing content you know will be popular (software releases, JS bundles) | Frequently-changing or unpredictable content |
| Storage cost | High (pre-populate all edges) | Lower (only cache what’s requested) |
| Complexity | Higher (you manage distribution) | Lower (CDN manages it) |
What to put on CDN:
- Static assets: JS, CSS, fonts, images (always)
- Video segments: HLS/DASH chunks (YouTube Ch14, Netflix-style)
- Map tiles: Pre-rendered PNG/vector tiles (Google Maps V2 Ch03)
- File downloads: User files from Google Drive (Vol 1 Ch15)
- API responses: Only if safe to cache (no user-specific data without cache-key on user ID)
What NOT to put on CDN:
- Authenticated API responses (unless you include auth token in cache key)
- Real-time data (order book, live prices)
- User-specific personalized content (news feed, notifications)
SDI chapters using CDN heavily:
- YouTube (Vol 1 Ch14): CDN is mandatory — video at 5M DAU without CDN = origin server melts down. CDN serves ~95% of video traffic.
- Google Drive (Vol 1 Ch15): Pull CDN for file downloads; cached at edge per file version.
- Google Maps (Vol 2 Ch03): Map tiles are static; Push CDN for popular zoom levels/regions.
- URL shortener (Vol 1 Ch08): Can cache redirect responses at CDN edge (301 cached by browser + CDN).
8. Microservices vs Monolith
| Factor | Monolith | Microservices |
|---|---|---|
| Team size | < 10–20 engineers | 50+ engineers |
| Deployment | One deployable unit | Independent per service |
| Scaling | Scale the whole app | Scale specific bottleneck services |
| Technology | Single stack | Polyglot (best tool per service) |
| Data | Shared single DB | DB per service (no shared DB!) |
| Latency | In-process calls (sub-ms) | Network calls (1–10ms per hop) |
| Debugging | Simple stack traces | Requires distributed tracing |
| When to start | Always (premature microservices = pain) | When monolith becomes deployment bottleneck |
API Gateway pattern (mandatory for microservices):
- Single entry point for all client requests
- Handles auth, rate limiting, request routing, protocol translation
- Without it, clients must know about every service
- SDI reference: Every chapter with microservices uses an API Gateway
Service mesh (optional, for large-scale microservices):
- Sidecars (Envoy) on every pod handle: retries, circuit breaking, mTLS, observability
- Use when > ~20 services or when network policy / tracing is critical
9. Scalability Decision Framework
Given a set of constraints, which patterns apply?
| Constraint / Symptom | Pattern to apply | SDI example |
|---|---|---|
| Read traffic >> write traffic (>5:1) | Read replicas + caching (Redis) | URL shortener, news feed, leaderboard |
| Write throughput > single DB capacity | Sharding (consistent hash or range) | Chat messages, metrics ingestion |
| Single server is SPOF | Horizontal scale + load balancer | All production systems |
| Global users with latency concerns | CDN + GeoDNS + multi-region | YouTube, Google Maps, Google Drive |
| Traffic is spiky / bursty | Message queue as buffer | Notification system, video transcoding |
| Real-time fan-out to many subscribers | Pub/Sub or Kafka topics | News feed fanout, notification fan-out |
| Hotspot / celebrity problem | Hybrid fanout + local cache + key splitting | News feed celebrity handling |
| Need to replay events | Kafka with retention | Ad click aggregation, metrics backfill |
| Near-millisecond response time | In-memory store (Redis) + connection pooling | Leaderboard, rate limiter, order book |
| Ultra-low latency (<1ms) | Co-location, LMAX Disruptor, in-process queue | Stock exchange matching engine |
| Data consistency across services | Outbox pattern + Saga | Payment system, hotel reservation |
| Schema flexibility needed | NoSQL (document DB or wide-column) | User profiles, activity events |
10. Numbers Every Engineer Should Know for Scalability
Latency Table (approximate, 2024)
| Operation | Latency |
|---|---|
| L1 cache read | 0.5 ns |
| L2 cache read | 7 ns |
| Main memory (RAM) read | 100 ns |
| Redis GET (local network) | ~0.5–1 ms |
| SSD random read | ~0.1 ms |
| Network round-trip (same datacenter) | ~0.5 ms |
| Network round-trip (cross-region, US East ↔ West) | ~70 ms |
| Network round-trip (US ↔ Europe) | ~100–120 ms |
| HDD seek | ~10 ms |
| MySQL query (simple, indexed) | ~1–5 ms |
| MySQL query (complex join) | ~10–100 ms |
| Cassandra read (quorum) | ~1–5 ms |
| S3 GET request | ~10–50 ms |
| DNS lookup | ~1–100 ms |
Throughput Table
| Component | Typical throughput |
|---|---|
| Single web server (nginx) | ~50K–100K req/sec |
| Single app server (Java Spring) | ~1K–10K req/sec (CPU bound) |
| Redis (single node) | ~100K–1M ops/sec |
| MySQL (reads, indexed) | ~10K–100K QPS per server |
| MySQL (writes) | ~1K–10K writes/sec |
| Cassandra (writes) | ~50K–100K writes/sec per node |
| Kafka (single partition) | ~10 MB/sec or ~100K msgs/sec |
| CDN (global, Cloudflare) | Terabits/sec aggregate |
| AWS S3 | Thousands of req/sec per prefix |
Storage Scale Reference
| Scale | Users | Storage need / year | Typical architecture |
|---|---|---|---|
| Startup | < 10K | < 100 GB | Single server + single DB |
| Growing | 10K–1M | 100 GB – 10 TB | Load balancer + DB replicas + CDN |
| Large | 1M–100M | 10 TB – 1 PB | Sharded DB + distributed cache + object storage |
| FAANG | 100M+ | 1 PB+ | Custom storage systems + multi-region |
Useful multiples:
- 1 million seconds ≈ 11.5 days
- 2.5 million seconds ≈ 1 month
- 1 billion seconds ≈ 31.7 years
QPS = DAU × actions_per_day / 86,400Peak QPS ≈ 2–3× average QPSStorage = daily_writes × size_per_entry × retention_days × replication_factor
See also:
- key-patterns > 2. Database Sharding — Sharding strategies
- key-patterns > 3. Consistent Hashing — Hash ring in detail
- key-patterns > 4. Load Balancing Algorithms — Algorithm comparison
- key-patterns > 5. Fanout Patterns — Fanout on write vs read
- distributed-system-components — Component-by-component reference
- estimation-cheatsheet — Full numbers reference
- ch01-scale-from-zero-to-millions — Progression of adding components
- ch04-rate-limiter — Rate limiting algorithms in full
- ch05-consistent-hashing — Consistent hashing deep dive
- ch11-news-feed — Fanout patterns in production
- cache-strategies — Caching across chapters
- storage-systems — Storage system selection