Key System Design Patterns

Reusable patterns that appear across multiple system designs. Know when and why to use each.

🏗️ Architectural Patterns

1. Client-Server Architecture

When to use: Almost every system
Components: Clients (web/mobile), Servers (API), Database

[Clients] ←→ [Load Balancer] ←→ [Servers] ←→ [Database]

Key points:

Stateless servers (easier to scale)
Session data in cache/database
Load balancer for distribution

2. Microservices vs Monolith

Monolith:

✅ Simple deployment, easier to develop initially
❌ Hard to scale specific components, all-or-nothing deployment

Microservices:

✅ Independent scaling, technology flexibility, fault isolation
❌ Complex deployment, network latency, data consistency challenges

When to use microservices:

Large team (>50 engineers)
Need independent scaling
Different components have different SLAs
Long-term project (years)

When to use monolith:

Small team (<10 engineers)
MVP/prototype
Simple domain
Short time to market

3. Event-Driven Architecture

When to use: Async operations, real-time updates, decoupling

[Service A] → [Message Queue] → [Service B]
                     ↓
              [Service C]

Benefits:

Loose coupling
Better fault tolerance
Peak load handling (queue acts as buffer)

Trade-offs:

More complex debugging
Eventual consistency
Message ordering challenges

Examples: Order processing, notifications, analytics

💾 Data Storage Patterns

1. Database Replication

Primary-Replica (Leader-Follower):

        [Primary]
           ↓
    ┌──────┼──────┐
    ↓      ↓      ↓
[Replica1] [Replica2] [Replica3]

Writes → Primary only
Reads → Any replica

Benefits:

Improved read performance
High availability (failover to replica)
Reduced load on primary

Trade-offs:

Replication lag (eventual consistency)
Complexity in handling failover

When to use: Read-heavy workloads (10:1 or higher read:write ratio)

2. Database Sharding (Partitioning)

Horizontal partitioning across multiple databases.

Users 1-1M    → [Shard 1]
Users 1M-2M   → [Shard 2]
Users 2M-3M   → [Shard 3]

Sharding strategies:

Hash-based:

shard = hash(userId) % number_of_shards

✅ Even distribution
❌ Hard to add shards (resharding)

Range-based:

Shard 1: A-F
Shard 2: G-M
Shard 3: N-Z

✅ Easy range queries
❌ Uneven distribution (hotspots)

Geographic:

US users → US shard
EU users → EU shard

✅ Low latency, data residency compliance
❌ Uneven distribution

When to use: Database is bottleneck, data > single server capacity

Challenges:

Cross-shard queries
Resharding
Celebrity problem (hotspots)

3. Consistent Hashing

Problem: Simple hashing breaks when adding/removing servers

Solution: Hash ring

       Server1(120°)
         ↗      ↖
    Key A(80°)  Key B(300°)
         ↘      ↙
       Server2(240°)

Key goes to next server clockwise

Benefits:

Adding/removing servers affects only adjacent keys
Better load distribution with virtual nodes

When to use:

Distributed caches (Redis cluster)
CDN routing
Load balancing
Any distributed hash table

Example use cases: Memcached, DynamoDB, Cassandra

4. SQL vs NoSQL

Aspect	SQL	NoSQL
Data Model	Structured, relational	Flexible, denormalized
Schema	Fixed schema	Schema-less
Scaling	Vertical (mostly)	Horizontal
ACID	Strong ACID	Eventual consistency (usually)
Joins	Easy, efficient	Difficult, application-level
Use Case	Structured data, complex queries	Flexible schema, high scalability

Choose SQL when:

Structured data with relationships
Need ACID transactions
Complex queries and joins
Data consistency critical
Example: Banking, e-commerce orders

Choose NoSQL when:

Flexible/evolving schema
Need horizontal scaling
Simple queries (key-value lookups)
High write throughput
Eventual consistency OK
Example: User profiles, logs, sessions

🚀 Performance Patterns

1. Caching Strategies

Cache-Aside (Lazy Loading):

1. Check cache
2. If miss → Query DB → Write to cache
3. If hit → Return cached data

✅ Only cache what’s needed
❌ Cache miss penalty

Write-Through:

1. Write to cache
2. Cache writes to DB
3. Return success

✅ Cache always consistent
❌ Write latency (2 writes)

Write-Behind (Write-Back):

1. Write to cache
2. Return success immediately
3. Cache writes to DB async

✅ Low write latency
❌ Risk of data loss

Write-Around:

1. Write directly to DB
2. Bypass cache
3. Next read loads to cache

✅ Avoid cache pollution from writes
❌ Read miss after write

When to use what:

Cache-aside: Most common, general purpose
Write-through: Strong consistency needed
Write-behind: High write throughput
Write-around: Large writes, rarely re-read

2. Cache Invalidation

Time-To-Live (TTL):

Set expiration time on cache entries
✅ Simple, automatic cleanup
❌ May serve stale data until expiry

Write Invalidation:

Delete cache entry on write
✅ Always fresh on read
❌ Overhead on writes, cache misses

Cache Stampede Prevention:

Problem: Cache expires → 1000 requests hit DB simultaneously

Solution 1: Lock
- First request gets lock, others wait
- Only one DB query

Solution 2: Probabilistic early expiration
- Refresh cache before TTL expires
- Based on load and randomness

3. CDN (Content Delivery Network)

Push CDN:

Upload content to CDN manually
✅ Full control, good for rarely changing content
❌ Manual management

Pull CDN:

CDN fetches from origin on cache miss
✅ Automatic, good for frequently changing content
❌ First request is slow (cache miss)

When to use:

Static content (images, videos, JS, CSS)
Geographically distributed users
Reduce origin server load

4. Load Balancing Algorithms

Round Robin:

Distribute requests sequentially
✅ Simple, fair distribution
❌ Ignores server load

Least Connections:

Send to server with fewest active connections
✅ Better for long-lived connections
❌ Overhead tracking connections

Least Response Time:

Send to server with lowest latency
✅ Best performance
❌ Complex to implement

IP Hash:

Hash client IP to select server
✅ Sticky sessions (same client → same server)
❌ Uneven distribution

Layer 4 vs Layer 7:

L4 (TCP): Faster, can’t read HTTP content
L7 (HTTP): Slower, can route based on URL/headers

🔐 Reliability Patterns

1. Rate Limiting

Algorithms:

Token Bucket:

- Bucket has tokens (capacity = 100)
- Tokens refill at rate (10/sec)
- Request consumes 1 token
- If no tokens → reject (429 Too Many Requests)

✅ Allows burst traffic
✅ Simple to implement

Leaky Bucket:

- Requests enter bucket
- Process at constant rate
- If bucket full → reject

✅ Smooth traffic
❌ No burst allowance

Fixed Window:

- Count requests per time window (1 minute)
- Reset counter at window boundary

✅ Very simple
❌ Burst at window boundaries

Sliding Window:

- Count requests in rolling time window
- More accurate than fixed window

✅ No boundary burst
❌ More complex

Where to apply:

API Gateway (per user/per IP)
Prevent abuse
Protect downstream services

2. Circuit Breaker

States: Closed → Open → Half-Open → Closed

Closed (normal):
- Requests pass through
- Count failures
- If failures > threshold → Open

Open (failing):
- Reject requests immediately (fail fast)
- After timeout → Half-Open

Half-Open (testing):
- Allow limited requests
- If success → Closed
- If failure → Open

When to use:

Calling external services
Prevent cascading failures
Fast failure instead of waiting

Example: If payment service is down, immediately return error instead of timing out.

3. Retry with Exponential Backoff

Attempt 1: Wait 1 second
Attempt 2: Wait 2 seconds
Attempt 3: Wait 4 seconds
Attempt 4: Wait 8 seconds
Max attempts: 5

Add jitter (randomness):

wait = base_delay * (2 ^ attempt) + random(0, 1000ms)

Prevents thundering herd

When to use:

Transient failures (network issues)
Service temporarily unavailable

Don’t retry:

Client errors (400, 401, 403, 404)
Idempotency issues (payment charged twice)

4. Idempotency

Problem: Network failure → client retries → duplicate operations

Solution: Idempotency key

POST /api/payment
Headers:
  Idempotency-Key: abc123

Server:
- Check if abc123 exists
- If yes → return cached response
- If no → process and cache result with key

When to use:

Payments, orders, critical operations
Any non-idempotent operation

📨 Messaging Patterns

1. Message Queue vs Pub/Sub

Message Queue (Point-to-Point):

[Producer] → [Queue] → [Consumer]

One message, one consumer
Work distribution
Example: Job processing

Pub/Sub (Broadcast):

           [Topic]
             ↓
    ┌────────┼────────┐
    ↓        ↓        ↓
[Sub A]  [Sub B]  [Sub C]

One message, multiple consumers
Event broadcasting
Example: Notifications, analytics

2. Push vs Pull

Push (Server pushes to client):

WebSockets, Server-Sent Events (SSE)
✅ Low latency, real-time
❌ Server maintains connections, doesn’t scale well

Pull (Client polls server):

HTTP polling, long polling
✅ Simple, scalable
❌ Higher latency, wasted requests

Hybrid:

Push for critical updates
Pull for less time-sensitive data

3. At-Most-Once vs At-Least-Once vs Exactly-Once

At-Most-Once:

Send message, no retry
May lose messages
Use case: Metrics, logs (OK to lose some)

At-Least-Once:

Retry until ack received
May duplicate messages
Use case: Most systems (with idempotency)

Exactly-Once:

Guarantee no duplicates
Complex, expensive
Use case: Financial transactions

🔍 Search & Discovery Patterns

1. Geospatial Indexing

Geohash:

Lat/Lon → String encoding
Nearby locations have common prefixes

Example:
- "u4pruyd" (Googleplex)
- "u4pruvq" (nearby)
Common prefix "u4pru" → same area

Quadtree:

Recursively divide map into 4 quadrants
Stop when region has < N items

When to use: Location-based search (Uber, Yelp, Google Maps)

2. Autocomplete/Typeahead

Trie (Prefix Tree):

       root
      /  |  \
     a   b   c
    / \   \
   p   r   o
  / \   \   \
 p   t   t   t

Optimizations:

Cache top N suggestions per prefix
Precompute popular queries
Limit to top 10 results

Ranking:

Popularity (search count)
Personalization (user history)
Recency (trending)

📊 Analytics Patterns

1. Lambda Architecture

Real-time layer (stream):
[Kafka] → [Flink] → [Redis]
  ↓
Batch layer:
[Kafka] → [Spark] → [HDFS/S3]
  ↓
[Serving layer combines both]

When to use: Need both real-time and accurate batch processing

2. Time-Series Data

Downsampling:

Raw data: 1-second granularity (1 week)
1-minute rollup (1 month)
1-hour rollup (1 year)
1-day rollup (forever)

Aggregation:

Pre-compute common queries
Sum, avg, min, max, percentiles

Storage: InfluxDB, TimescaleDB, Prometheus

🎯 Pattern Selection Guide

By Scale

Small (< 10K users):

Monolith
Single database
Simple cache

Medium (10K - 1M users):

Microservices (optional)
Database replication
CDN, cache layer

Large (> 1M users):

Microservices
Database sharding
Distributed cache
Message queues
Multi-region

By Consistency Requirements

Strong consistency:

SQL database
Synchronous replication
Distributed transactions

Eventual consistency:

NoSQL database
Async replication
Event sourcing

By Latency Requirements

< 100ms (real-time):

In-memory cache
WebSockets
Edge computing

< 1s (interactive):

Database with caching
CDN

> 1s (batch OK):

Message queues
Background jobs

📚 Pattern Combinations

Common combos in interviews:

Social Network (Twitter, Instagram):
- Microservices + Sharding + Cache + CDN + Message Queue
E-commerce (Amazon):
- Monolith/Microservices + SQL + Cache + Payment idempotency
Ride-sharing (Uber):
- Geohash + WebSockets + Sharding + Message Queue
Video Streaming (YouTube):
- CDN + Object Storage + Adaptive bitrate + Analytics
Search (Google):
- Inverted index + Distributed crawling + Sharding + Caching

Remember: No pattern is perfect. Always discuss trade-offs!

Last Updated: 2026-04-08

Study Notes by Niladri & AI

Explorer

key-patterns

Key System Design Patterns

🏗️ Architectural Patterns

1. Client-Server Architecture

2. Microservices vs Monolith

3. Event-Driven Architecture

💾 Data Storage Patterns

1. Database Replication

2. Database Sharding (Partitioning)

3. Consistent Hashing

4. SQL vs NoSQL

🚀 Performance Patterns

1. Caching Strategies

2. Cache Invalidation

3. CDN (Content Delivery Network)

4. Load Balancing Algorithms

🔐 Reliability Patterns

1. Rate Limiting

2. Circuit Breaker

3. Retry with Exponential Backoff

4. Idempotency

📨 Messaging Patterns

1. Message Queue vs Pub/Sub

2. Push vs Pull

3. At-Most-Once vs At-Least-Once vs Exactly-Once

🔍 Search & Discovery Patterns

1. Geospatial Indexing

2. Autocomplete/Typeahead

📊 Analytics Patterns

1. Lambda Architecture

2. Time-Series Data

🎯 Pattern Selection Guide

By Scale

By Consistency Requirements

By Latency Requirements

📚 Pattern Combinations

Graph View

Table of Contents

Backlinks