Key System Design Patterns
Reusable patterns that appear across multiple system designs. Know when and why to use each.
ποΈ Architectural Patterns
1. Client-Server Architecture
When to use: Almost every system
Components: Clients (web/mobile), Servers (API), Database
[Clients] ββ [Load Balancer] ββ [Servers] ββ [Database]
Key points:
- Stateless servers (easier to scale)
- Session data in cache/database
- Load balancer for distribution
2. Microservices vs Monolith
Monolith:
- β Simple deployment, easier to develop initially
- β Hard to scale specific components, all-or-nothing deployment
Microservices:
- β Independent scaling, technology flexibility, fault isolation
- β Complex deployment, network latency, data consistency challenges
When to use microservices:
- Large team (>50 engineers)
- Need independent scaling
- Different components have different SLAs
- Long-term project (years)
When to use monolith:
- Small team (<10 engineers)
- MVP/prototype
- Simple domain
- Short time to market
3. Event-Driven Architecture
When to use: Async operations, real-time updates, decoupling
[Service A] β [Message Queue] β [Service B]
β
[Service C]
Benefits:
- Loose coupling
- Better fault tolerance
- Peak load handling (queue acts as buffer)
Trade-offs:
- More complex debugging
- Eventual consistency
- Message ordering challenges
Examples: Order processing, notifications, analytics
πΎ Data Storage Patterns
1. Database Replication
Primary-Replica (Leader-Follower):
[Primary]
β
ββββββββΌβββββββ
β β β
[Replica1] [Replica2] [Replica3]
Writes β Primary only
Reads β Any replica
Benefits:
- Improved read performance
- High availability (failover to replica)
- Reduced load on primary
Trade-offs:
- Replication lag (eventual consistency)
- Complexity in handling failover
When to use: Read-heavy workloads (10:1 or higher read:write ratio)
2. Database Sharding (Partitioning)
Horizontal partitioning across multiple databases.
Users 1-1M β [Shard 1]
Users 1M-2M β [Shard 2]
Users 2M-3M β [Shard 3]
Sharding strategies:
Hash-based:
shard = hash(userId) % number_of_shards
- β Even distribution
- β Hard to add shards (resharding)
Range-based:
Shard 1: A-F
Shard 2: G-M
Shard 3: N-Z
- β Easy range queries
- β Uneven distribution (hotspots)
Geographic:
US users β US shard
EU users β EU shard
- β Low latency, data residency compliance
- β Uneven distribution
When to use: Database is bottleneck, data > single server capacity
Challenges:
- Cross-shard queries
- Resharding
- Celebrity problem (hotspots)
3. Consistent Hashing
Problem: Simple hashing breaks when adding/removing servers
Solution: Hash ring
Server1(120Β°)
β β
Key A(80Β°) Key B(300Β°)
β β
Server2(240Β°)
Key goes to next server clockwise
Benefits:
- Adding/removing servers affects only adjacent keys
- Better load distribution with virtual nodes
When to use:
- Distributed caches (Redis cluster)
- CDN routing
- Load balancing
- Any distributed hash table
Example use cases: Memcached, DynamoDB, Cassandra
4. SQL vs NoSQL
| Aspect | SQL | NoSQL |
|---|---|---|
| Data Model | Structured, relational | Flexible, denormalized |
| Schema | Fixed schema | Schema-less |
| Scaling | Vertical (mostly) | Horizontal |
| ACID | Strong ACID | Eventual consistency (usually) |
| Joins | Easy, efficient | Difficult, application-level |
| Use Case | Structured data, complex queries | Flexible schema, high scalability |
Choose SQL when:
- Structured data with relationships
- Need ACID transactions
- Complex queries and joins
- Data consistency critical
- Example: Banking, e-commerce orders
Choose NoSQL when:
- Flexible/evolving schema
- Need horizontal scaling
- Simple queries (key-value lookups)
- High write throughput
- Eventual consistency OK
- Example: User profiles, logs, sessions
π Performance Patterns
1. Caching Strategies
Cache-Aside (Lazy Loading):
1. Check cache
2. If miss β Query DB β Write to cache
3. If hit β Return cached data
- β Only cache whatβs needed
- β Cache miss penalty
Write-Through:
1. Write to cache
2. Cache writes to DB
3. Return success
- β Cache always consistent
- β Write latency (2 writes)
Write-Behind (Write-Back):
1. Write to cache
2. Return success immediately
3. Cache writes to DB async
- β Low write latency
- β Risk of data loss
Write-Around:
1. Write directly to DB
2. Bypass cache
3. Next read loads to cache
- β Avoid cache pollution from writes
- β Read miss after write
When to use what:
- Cache-aside: Most common, general purpose
- Write-through: Strong consistency needed
- Write-behind: High write throughput
- Write-around: Large writes, rarely re-read
2. Cache Invalidation
Time-To-Live (TTL):
- Set expiration time on cache entries
- β Simple, automatic cleanup
- β May serve stale data until expiry
Write Invalidation:
- Delete cache entry on write
- β Always fresh on read
- β Overhead on writes, cache misses
Cache Stampede Prevention:
Problem: Cache expires β 1000 requests hit DB simultaneously
Solution 1: Lock
- First request gets lock, others wait
- Only one DB query
Solution 2: Probabilistic early expiration
- Refresh cache before TTL expires
- Based on load and randomness
3. CDN (Content Delivery Network)
Push CDN:
- Upload content to CDN manually
- β Full control, good for rarely changing content
- β Manual management
Pull CDN:
- CDN fetches from origin on cache miss
- β Automatic, good for frequently changing content
- β First request is slow (cache miss)
When to use:
- Static content (images, videos, JS, CSS)
- Geographically distributed users
- Reduce origin server load
4. Load Balancing Algorithms
Round Robin:
- Distribute requests sequentially
- β Simple, fair distribution
- β Ignores server load
Least Connections:
- Send to server with fewest active connections
- β Better for long-lived connections
- β Overhead tracking connections
Least Response Time:
- Send to server with lowest latency
- β Best performance
- β Complex to implement
IP Hash:
- Hash client IP to select server
- β Sticky sessions (same client β same server)
- β Uneven distribution
Layer 4 vs Layer 7:
- L4 (TCP): Faster, canβt read HTTP content
- L7 (HTTP): Slower, can route based on URL/headers
π Reliability Patterns
1. Rate Limiting
Algorithms:
Token Bucket:
- Bucket has tokens (capacity = 100)
- Tokens refill at rate (10/sec)
- Request consumes 1 token
- If no tokens β reject (429 Too Many Requests)
- β Allows burst traffic
- β Simple to implement
Leaky Bucket:
- Requests enter bucket
- Process at constant rate
- If bucket full β reject
- β Smooth traffic
- β No burst allowance
Fixed Window:
- Count requests per time window (1 minute)
- Reset counter at window boundary
- β Very simple
- β Burst at window boundaries
Sliding Window:
- Count requests in rolling time window
- More accurate than fixed window
- β No boundary burst
- β More complex
Where to apply:
- API Gateway (per user/per IP)
- Prevent abuse
- Protect downstream services
2. Circuit Breaker
States: Closed β Open β Half-Open β Closed
Closed (normal):
- Requests pass through
- Count failures
- If failures > threshold β Open
Open (failing):
- Reject requests immediately (fail fast)
- After timeout β Half-Open
Half-Open (testing):
- Allow limited requests
- If success β Closed
- If failure β Open
When to use:
- Calling external services
- Prevent cascading failures
- Fast failure instead of waiting
Example: If payment service is down, immediately return error instead of timing out.
3. Retry with Exponential Backoff
Attempt 1: Wait 1 second
Attempt 2: Wait 2 seconds
Attempt 3: Wait 4 seconds
Attempt 4: Wait 8 seconds
Max attempts: 5
Add jitter (randomness):
wait = base_delay * (2 ^ attempt) + random(0, 1000ms)
- Prevents thundering herd
When to use:
- Transient failures (network issues)
- Service temporarily unavailable
Donβt retry:
- Client errors (400, 401, 403, 404)
- Idempotency issues (payment charged twice)
4. Idempotency
Problem: Network failure β client retries β duplicate operations
Solution: Idempotency key
POST /api/payment
Headers:
Idempotency-Key: abc123
Server:
- Check if abc123 exists
- If yes β return cached response
- If no β process and cache result with key
When to use:
- Payments, orders, critical operations
- Any non-idempotent operation
π¨ Messaging Patterns
1. Message Queue vs Pub/Sub
Message Queue (Point-to-Point):
[Producer] β [Queue] β [Consumer]
- One message, one consumer
- Work distribution
- Example: Job processing
Pub/Sub (Broadcast):
[Topic]
β
ββββββββββΌβββββββββ
β β β
[Sub A] [Sub B] [Sub C]
- One message, multiple consumers
- Event broadcasting
- Example: Notifications, analytics
2. Push vs Pull
Push (Server pushes to client):
- WebSockets, Server-Sent Events (SSE)
- β Low latency, real-time
- β Server maintains connections, doesnβt scale well
Pull (Client polls server):
- HTTP polling, long polling
- β Simple, scalable
- β Higher latency, wasted requests
Hybrid:
- Push for critical updates
- Pull for less time-sensitive data
3. At-Most-Once vs At-Least-Once vs Exactly-Once
At-Most-Once:
- Send message, no retry
- May lose messages
- Use case: Metrics, logs (OK to lose some)
At-Least-Once:
- Retry until ack received
- May duplicate messages
- Use case: Most systems (with idempotency)
Exactly-Once:
- Guarantee no duplicates
- Complex, expensive
- Use case: Financial transactions
π Search & Discovery Patterns
1. Geospatial Indexing
Geohash:
Lat/Lon β String encoding
Nearby locations have common prefixes
Example:
- "u4pruyd" (Googleplex)
- "u4pruvq" (nearby)
Common prefix "u4pru" β same area
Quadtree:
Recursively divide map into 4 quadrants
Stop when region has < N items
When to use: Location-based search (Uber, Yelp, Google Maps)
2. Autocomplete/Typeahead
Trie (Prefix Tree):
root
/ | \
a b c
/ \ \
p r o
/ \ \ \
p t t t
Optimizations:
- Cache top N suggestions per prefix
- Precompute popular queries
- Limit to top 10 results
Ranking:
- Popularity (search count)
- Personalization (user history)
- Recency (trending)
π Analytics Patterns
1. Lambda Architecture
Real-time layer (stream):
[Kafka] β [Flink] β [Redis]
β
Batch layer:
[Kafka] β [Spark] β [HDFS/S3]
β
[Serving layer combines both]
When to use: Need both real-time and accurate batch processing
2. Time-Series Data
Downsampling:
Raw data: 1-second granularity (1 week)
1-minute rollup (1 month)
1-hour rollup (1 year)
1-day rollup (forever)
Aggregation:
- Pre-compute common queries
- Sum, avg, min, max, percentiles
Storage: InfluxDB, TimescaleDB, Prometheus
π― Pattern Selection Guide
By Scale
Small (< 10K users):
- Monolith
- Single database
- Simple cache
Medium (10K - 1M users):
- Microservices (optional)
- Database replication
- CDN, cache layer
Large (> 1M users):
- Microservices
- Database sharding
- Distributed cache
- Message queues
- Multi-region
By Consistency Requirements
Strong consistency:
- SQL database
- Synchronous replication
- Distributed transactions
Eventual consistency:
- NoSQL database
- Async replication
- Event sourcing
By Latency Requirements
< 100ms (real-time):
- In-memory cache
- WebSockets
- Edge computing
< 1s (interactive):
- Database with caching
- CDN
> 1s (batch OK):
- Message queues
- Background jobs
π Pattern Combinations
Common combos in interviews:
-
Social Network (Twitter, Instagram):
- Microservices + Sharding + Cache + CDN + Message Queue
-
E-commerce (Amazon):
- Monolith/Microservices + SQL + Cache + Payment idempotency
-
Ride-sharing (Uber):
- Geohash + WebSockets + Sharding + Message Queue
-
Video Streaming (YouTube):
- CDN + Object Storage + Adaptive bitrate + Analytics
-
Search (Google):
- Inverted index + Distributed crawling + Sharding + Caching
Remember: No pattern is perfect. Always discuss trade-offs!
Last Updated: 2026-04-08