Key System Design Patterns

Reusable patterns that appear across multiple system designs. Know when and why to use each.

πŸ—οΈ Architectural Patterns

1. Client-Server Architecture

When to use: Almost every system
Components: Clients (web/mobile), Servers (API), Database

[Clients] ←→ [Load Balancer] ←→ [Servers] ←→ [Database]

Key points:

  • Stateless servers (easier to scale)
  • Session data in cache/database
  • Load balancer for distribution

2. Microservices vs Monolith

Monolith:

  • βœ… Simple deployment, easier to develop initially
  • ❌ Hard to scale specific components, all-or-nothing deployment

Microservices:

  • βœ… Independent scaling, technology flexibility, fault isolation
  • ❌ Complex deployment, network latency, data consistency challenges

When to use microservices:

  • Large team (>50 engineers)
  • Need independent scaling
  • Different components have different SLAs
  • Long-term project (years)

When to use monolith:

  • Small team (<10 engineers)
  • MVP/prototype
  • Simple domain
  • Short time to market

3. Event-Driven Architecture

When to use: Async operations, real-time updates, decoupling

[Service A] β†’ [Message Queue] β†’ [Service B]
                     ↓
              [Service C]

Benefits:

  • Loose coupling
  • Better fault tolerance
  • Peak load handling (queue acts as buffer)

Trade-offs:

  • More complex debugging
  • Eventual consistency
  • Message ordering challenges

Examples: Order processing, notifications, analytics


πŸ’Ύ Data Storage Patterns

1. Database Replication

Primary-Replica (Leader-Follower):

        [Primary]
           ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”
    ↓      ↓      ↓
[Replica1] [Replica2] [Replica3]

Writes β†’ Primary only
Reads β†’ Any replica

Benefits:

  • Improved read performance
  • High availability (failover to replica)
  • Reduced load on primary

Trade-offs:

  • Replication lag (eventual consistency)
  • Complexity in handling failover

When to use: Read-heavy workloads (10:1 or higher read:write ratio)


2. Database Sharding (Partitioning)

Horizontal partitioning across multiple databases.

Users 1-1M    β†’ [Shard 1]
Users 1M-2M   β†’ [Shard 2]
Users 2M-3M   β†’ [Shard 3]

Sharding strategies:

Hash-based:

shard = hash(userId) % number_of_shards
  • βœ… Even distribution
  • ❌ Hard to add shards (resharding)

Range-based:

Shard 1: A-F
Shard 2: G-M
Shard 3: N-Z
  • βœ… Easy range queries
  • ❌ Uneven distribution (hotspots)

Geographic:

US users β†’ US shard
EU users β†’ EU shard
  • βœ… Low latency, data residency compliance
  • ❌ Uneven distribution

When to use: Database is bottleneck, data > single server capacity

Challenges:

  • Cross-shard queries
  • Resharding
  • Celebrity problem (hotspots)

3. Consistent Hashing

Problem: Simple hashing breaks when adding/removing servers

Solution: Hash ring

       Server1(120Β°)
         β†—      β†–
    Key A(80Β°)  Key B(300Β°)
         β†˜      ↙
       Server2(240Β°)

Key goes to next server clockwise

Benefits:

  • Adding/removing servers affects only adjacent keys
  • Better load distribution with virtual nodes

When to use:

  • Distributed caches (Redis cluster)
  • CDN routing
  • Load balancing
  • Any distributed hash table

Example use cases: Memcached, DynamoDB, Cassandra


4. SQL vs NoSQL

AspectSQLNoSQL
Data ModelStructured, relationalFlexible, denormalized
SchemaFixed schemaSchema-less
ScalingVertical (mostly)Horizontal
ACIDStrong ACIDEventual consistency (usually)
JoinsEasy, efficientDifficult, application-level
Use CaseStructured data, complex queriesFlexible schema, high scalability

Choose SQL when:

  • Structured data with relationships
  • Need ACID transactions
  • Complex queries and joins
  • Data consistency critical
  • Example: Banking, e-commerce orders

Choose NoSQL when:

  • Flexible/evolving schema
  • Need horizontal scaling
  • Simple queries (key-value lookups)
  • High write throughput
  • Eventual consistency OK
  • Example: User profiles, logs, sessions

πŸš€ Performance Patterns

1. Caching Strategies

Cache-Aside (Lazy Loading):

1. Check cache
2. If miss β†’ Query DB β†’ Write to cache
3. If hit β†’ Return cached data
  • βœ… Only cache what’s needed
  • ❌ Cache miss penalty

Write-Through:

1. Write to cache
2. Cache writes to DB
3. Return success
  • βœ… Cache always consistent
  • ❌ Write latency (2 writes)

Write-Behind (Write-Back):

1. Write to cache
2. Return success immediately
3. Cache writes to DB async
  • βœ… Low write latency
  • ❌ Risk of data loss

Write-Around:

1. Write directly to DB
2. Bypass cache
3. Next read loads to cache
  • βœ… Avoid cache pollution from writes
  • ❌ Read miss after write

When to use what:

  • Cache-aside: Most common, general purpose
  • Write-through: Strong consistency needed
  • Write-behind: High write throughput
  • Write-around: Large writes, rarely re-read

2. Cache Invalidation

Time-To-Live (TTL):

  • Set expiration time on cache entries
  • βœ… Simple, automatic cleanup
  • ❌ May serve stale data until expiry

Write Invalidation:

  • Delete cache entry on write
  • βœ… Always fresh on read
  • ❌ Overhead on writes, cache misses

Cache Stampede Prevention:

Problem: Cache expires β†’ 1000 requests hit DB simultaneously

Solution 1: Lock
- First request gets lock, others wait
- Only one DB query

Solution 2: Probabilistic early expiration
- Refresh cache before TTL expires
- Based on load and randomness

3. CDN (Content Delivery Network)

Push CDN:

  • Upload content to CDN manually
  • βœ… Full control, good for rarely changing content
  • ❌ Manual management

Pull CDN:

  • CDN fetches from origin on cache miss
  • βœ… Automatic, good for frequently changing content
  • ❌ First request is slow (cache miss)

When to use:

  • Static content (images, videos, JS, CSS)
  • Geographically distributed users
  • Reduce origin server load

4. Load Balancing Algorithms

Round Robin:

  • Distribute requests sequentially
  • βœ… Simple, fair distribution
  • ❌ Ignores server load

Least Connections:

  • Send to server with fewest active connections
  • βœ… Better for long-lived connections
  • ❌ Overhead tracking connections

Least Response Time:

  • Send to server with lowest latency
  • βœ… Best performance
  • ❌ Complex to implement

IP Hash:

  • Hash client IP to select server
  • βœ… Sticky sessions (same client β†’ same server)
  • ❌ Uneven distribution

Layer 4 vs Layer 7:

  • L4 (TCP): Faster, can’t read HTTP content
  • L7 (HTTP): Slower, can route based on URL/headers

πŸ” Reliability Patterns

1. Rate Limiting

Algorithms:

Token Bucket:

- Bucket has tokens (capacity = 100)
- Tokens refill at rate (10/sec)
- Request consumes 1 token
- If no tokens β†’ reject (429 Too Many Requests)
  • βœ… Allows burst traffic
  • βœ… Simple to implement

Leaky Bucket:

- Requests enter bucket
- Process at constant rate
- If bucket full β†’ reject
  • βœ… Smooth traffic
  • ❌ No burst allowance

Fixed Window:

- Count requests per time window (1 minute)
- Reset counter at window boundary
  • βœ… Very simple
  • ❌ Burst at window boundaries

Sliding Window:

- Count requests in rolling time window
- More accurate than fixed window
  • βœ… No boundary burst
  • ❌ More complex

Where to apply:

  • API Gateway (per user/per IP)
  • Prevent abuse
  • Protect downstream services

2. Circuit Breaker

States: Closed β†’ Open β†’ Half-Open β†’ Closed

Closed (normal):
- Requests pass through
- Count failures
- If failures > threshold β†’ Open

Open (failing):
- Reject requests immediately (fail fast)
- After timeout β†’ Half-Open

Half-Open (testing):
- Allow limited requests
- If success β†’ Closed
- If failure β†’ Open

When to use:

  • Calling external services
  • Prevent cascading failures
  • Fast failure instead of waiting

Example: If payment service is down, immediately return error instead of timing out.


3. Retry with Exponential Backoff

Attempt 1: Wait 1 second
Attempt 2: Wait 2 seconds
Attempt 3: Wait 4 seconds
Attempt 4: Wait 8 seconds
Max attempts: 5

Add jitter (randomness):

wait = base_delay * (2 ^ attempt) + random(0, 1000ms)
  • Prevents thundering herd

When to use:

  • Transient failures (network issues)
  • Service temporarily unavailable

Don’t retry:

  • Client errors (400, 401, 403, 404)
  • Idempotency issues (payment charged twice)

4. Idempotency

Problem: Network failure β†’ client retries β†’ duplicate operations

Solution: Idempotency key

POST /api/payment
Headers:
  Idempotency-Key: abc123

Server:
- Check if abc123 exists
- If yes β†’ return cached response
- If no β†’ process and cache result with key

When to use:

  • Payments, orders, critical operations
  • Any non-idempotent operation

πŸ“¨ Messaging Patterns

1. Message Queue vs Pub/Sub

Message Queue (Point-to-Point):

[Producer] β†’ [Queue] β†’ [Consumer]
  • One message, one consumer
  • Work distribution
  • Example: Job processing

Pub/Sub (Broadcast):

           [Topic]
             ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
    ↓        ↓        ↓
[Sub A]  [Sub B]  [Sub C]
  • One message, multiple consumers
  • Event broadcasting
  • Example: Notifications, analytics

2. Push vs Pull

Push (Server pushes to client):

  • WebSockets, Server-Sent Events (SSE)
  • βœ… Low latency, real-time
  • ❌ Server maintains connections, doesn’t scale well

Pull (Client polls server):

  • HTTP polling, long polling
  • βœ… Simple, scalable
  • ❌ Higher latency, wasted requests

Hybrid:

  • Push for critical updates
  • Pull for less time-sensitive data

3. At-Most-Once vs At-Least-Once vs Exactly-Once

At-Most-Once:

  • Send message, no retry
  • May lose messages
  • Use case: Metrics, logs (OK to lose some)

At-Least-Once:

  • Retry until ack received
  • May duplicate messages
  • Use case: Most systems (with idempotency)

Exactly-Once:

  • Guarantee no duplicates
  • Complex, expensive
  • Use case: Financial transactions

πŸ” Search & Discovery Patterns

1. Geospatial Indexing

Geohash:

Lat/Lon β†’ String encoding
Nearby locations have common prefixes

Example:
- "u4pruyd" (Googleplex)
- "u4pruvq" (nearby)
Common prefix "u4pru" β†’ same area

Quadtree:

Recursively divide map into 4 quadrants
Stop when region has < N items

When to use: Location-based search (Uber, Yelp, Google Maps)


2. Autocomplete/Typeahead

Trie (Prefix Tree):

       root
      /  |  \
     a   b   c
    / \   \
   p   r   o
  / \   \   \
 p   t   t   t

Optimizations:

  • Cache top N suggestions per prefix
  • Precompute popular queries
  • Limit to top 10 results

Ranking:

  • Popularity (search count)
  • Personalization (user history)
  • Recency (trending)

πŸ“Š Analytics Patterns

1. Lambda Architecture

Real-time layer (stream):
[Kafka] β†’ [Flink] β†’ [Redis]
  ↓
Batch layer:
[Kafka] β†’ [Spark] β†’ [HDFS/S3]
  ↓
[Serving layer combines both]

When to use: Need both real-time and accurate batch processing


2. Time-Series Data

Downsampling:

Raw data: 1-second granularity (1 week)
1-minute rollup (1 month)
1-hour rollup (1 year)
1-day rollup (forever)

Aggregation:

  • Pre-compute common queries
  • Sum, avg, min, max, percentiles

Storage: InfluxDB, TimescaleDB, Prometheus


🎯 Pattern Selection Guide

By Scale

Small (< 10K users):

  • Monolith
  • Single database
  • Simple cache

Medium (10K - 1M users):

  • Microservices (optional)
  • Database replication
  • CDN, cache layer

Large (> 1M users):

  • Microservices
  • Database sharding
  • Distributed cache
  • Message queues
  • Multi-region

By Consistency Requirements

Strong consistency:

  • SQL database
  • Synchronous replication
  • Distributed transactions

Eventual consistency:

  • NoSQL database
  • Async replication
  • Event sourcing

By Latency Requirements

< 100ms (real-time):

  • In-memory cache
  • WebSockets
  • Edge computing

< 1s (interactive):

  • Database with caching
  • CDN

> 1s (batch OK):

  • Message queues
  • Background jobs

πŸ“š Pattern Combinations

Common combos in interviews:

  1. Social Network (Twitter, Instagram):

    • Microservices + Sharding + Cache + CDN + Message Queue
  2. E-commerce (Amazon):

    • Monolith/Microservices + SQL + Cache + Payment idempotency
  3. Ride-sharing (Uber):

    • Geohash + WebSockets + Sharding + Message Queue
  4. Video Streaming (YouTube):

    • CDN + Object Storage + Adaptive bitrate + Analytics
  5. Search (Google):

    • Inverted index + Distributed crawling + Sharding + Caching

Remember: No pattern is perfect. Always discuss trade-offs!


Last Updated: 2026-04-08