Chapter 3: A Framework For System Design Interviews

volume1 framework interview-prep

Status: đźź© Interview ready (This is THE most important chapter!)
Difficulty: Essential - Must master
Time to complete: 30 min read, practice daily


Overview

This chapter provides the 4-step framework for tackling any system design interview. Master this framework and apply it to every design problem.

Why this matters: Interviewers want to see your problem-solving process, not just the final solution. The framework shows structured thinking.

The 4-Step Framework

Step 1: Understand the Problem and Establish Design Scope (3-10 min)

Goal: Clarify requirements and agree on scope with interviewer.

Why it matters:

  • Shows you don’t make assumptions
  • Demonstrates communication skills
  • Aligns expectations with interviewer

What to do:

  1. Ask clarifying questions

    • Don’t jump straight to solutions
    • Think out loud
    • Engage with interviewer
  2. Define functional requirements

    • What features are needed?
    • What is the core functionality?
    • What can be omitted for MVP?
  3. Define non-functional requirements

    • Scale (users, data volume)
    • Performance (latency, throughput)
    • Availability (uptime SLA)
    • Consistency (strong vs eventual)
  4. Write down assumptions

    • “Let’s assume 100M DAU”
    • “I’ll design for 99.9% availability”
    • Get interviewer buy-in

Example Questions (URL Shortener):

  • “What’s the traffic volume? Millions of URLs per day?”
  • “How long should shortened URLs last? Forever or expire?”
  • “Can users customize short URLs?”
  • “Should we track analytics (click counts)?”
  • “What’s the availability requirement?”

Red flags:
❌ Jumping straight to design
❌ Staying silent
❌ Assuming too much without asking

Good signs:
âś… Asking thoughtful questions
âś… Writing down requirements
âś… Getting interviewer agreement

Step 2: Propose High-Level Design and Get Buy-In (10-15 min)

Goal: Draw initial architecture and validate approach with interviewer.

Why it matters:

  • Shows you can think at the right abstraction level
  • Allows course correction early
  • Builds rapport with interviewer

What to do:

  1. Start simple, then iterate

    • Don’t start with complex architecture
    • Begin with basic components
    • Add complexity based on requirements
  2. Draw diagrams

    • Use boxes and arrows
    • Label components clearly
    • Show data flow
  3. Do back-of-envelope calculations

    • Estimate QPS, storage, bandwidth
    • Shows quantitative thinking
    • Validates design decisions
  4. Design APIs

    • 2-3 core endpoints
    • REST or RPC style
    • Request/response formats
  5. Define data model

    • Key entities
    • Relationships
    • Storage choice (SQL vs NoSQL)
  6. Get agreement before proceeding

    • “Does this high-level design make sense?”
    • “Should I dive deeper into any component?”

Example (URL Shortener):

Architecture:
[Client] → [Load Balancer] → [Web Servers] → [Cache] → [Database]
                                                   ↓
                                            [URL Generation Service]
API:
POST /api/v1/shorten
  Request: { "longUrl": "https://example.com/very/long/url" }
  Response: { "shortUrl": "https://short.url/abc123" }

GET /api/v1/{shortUrl}
  Response: Redirect 302 to longUrl
Data Model:
URL (table)
- id (PK)
- shortCode (indexed)
- longUrl
- createdAt
- expiresAt
- userId (FK, optional)
Estimation:
- Write: 100M URLs/day = 1,200 QPS
- Read: 10x ratio = 12,000 QPS
- Storage: 100M Ă— 500 bytes = 50 GB/day
- 5 years: 50GB Ă— 365 Ă— 5 = 91 TB

Red flags:
❌ Too much detail too early
❌ No diagram
❌ Skipping calculations
❌ Not checking in with interviewer

Good signs:
âś… Clear diagram
âś… Quantitative reasoning
âś… Iterative approach
âś… Seeking feedback

Step 3: Design Deep Dive (10-25 min)

Goal: Drill into specific components based on interviewer’s interest.

Why it matters:

  • Demonstrates technical depth
  • Shows understanding of trade-offs
  • Reveals experience with real systems

What to do:

  1. Ask what to focus on

    • “Which component should we discuss in detail?”
    • Interviewer will guide the direction
  2. Common deep dive topics:

    • Scalability: How to handle 10x, 100x growth?
    • Performance: How to optimize latency?
    • Reliability: How to handle failures?
    • Consistency: Strong vs eventual?
    • Security: Authentication, authorization, rate limiting
    • Data storage: SQL vs NoSQL, sharding, replication
  3. Discuss trade-offs

    • Every decision has pros and cons
    • Explain why you chose one approach
    • Mention alternatives and when they’d be better
  4. Use specific technologies

    • Redis for caching
    • Kafka for messaging
    • Cassandra for wide-column store
    • But explain WHY, not just name-dropping

Example Topics (URL Shortener):

Topic 1: URL Shortening Algorithm

Approach 1: Hash function (MD5, SHA-256)
Pros:
- Fast computation
- Distributed (no coordination)
Cons:
- Hash collision possible
- Fixed length (longer than needed)
- Need collision resolution

Approach 2: Base62 encoding with counter
Pros:
- Short codes (7 chars for billions of URLs)
- No collisions
- Predictable length
Cons:
- Need distributed ID generation
- Possible guessing of sequential URLs

Approach 3: Pre-generate codes
Pros:
- Very fast (just read from pool)
- Short codes
Cons:
- Requires separate service
- Memory overhead

Choice: Approach 2 with distributed ID generator (Snowflake-style)
Reason: Balances performance, collision-free, reasonable length

Topic 2: Handling High Read Traffic

Problem: 12K read QPS can overwhelm database

Solution 1: Cache
- Cache hot URLs in Redis/Memcached
- 80-90% hit rate reduces DB load to 1.2K QPS
- TTL for expired URLs
- Cache-aside pattern

Solution 2: CDN
- Cache at edge locations
- Lowest latency for users
- Redirect 301 for permanent URLs

Solution 3: Read replicas
- Separate read and write traffic
- 3 replicas handle 4K QPS each

Combined approach: Cache + CDN + Replicas

Topic 3: Database Choice

SQL (PostgreSQL):
Pros: ACID, structured data, easy queries
Cons: Harder to scale horizontally

NoSQL (Cassandra):
Pros: High write throughput, easy sharding
Cons: Eventual consistency, limited queries

Choice: SQL for MVP, NoSQL for massive scale
Reason: SQL simpler initially, NoSQL when sharding needed

Red flags:
❌ Not discussing trade-offs
❌ Only one solution, no alternatives
❌ Name-dropping without explanation
❌ Going too broad instead of deep

Good signs:
âś… Multiple solutions considered
âś… Clear reasoning for choices
âś… Acknowledges limitations
âś… Uses concrete numbers

Step 4: Wrap Up (3-5 min)

Goal: Show you think beyond the immediate design.

Why it matters:

  • Demonstrates you understand production systems
  • Shows forward thinking
  • Leaves strong final impression

What to discuss:

1. Identify bottlenecks

  • “At 100K QPS, the database becomes a bottleneck”
  • “We’d need to shard by userId”

2. System failures

  • “If cache goes down, implement circuit breaker to database”
  • “Multi-region setup for disaster recovery”

3. Monitoring and metrics

  • Key metrics: QPS, latency (p50, p99), error rate, cache hit rate
  • Alerting: When QPS > 80% capacity, alert on-call
  • Dashboards: Real-time metrics for operations

4. Next steps and enhancements

  • “Add analytics for click tracking”
  • “Implement custom URLs feature”
  • “A/B testing framework”
  • “Machine learning for fraud detection”

5. Scale refinement and cost estimation

  • “At current design, supports 50M URLs/day”
  • “To reach 1B URLs/day, we’d need database sharding”
  • “Let me estimate monthly costs…” (see example below)

Cost Estimation Example (URL Shortener):

Based on our earlier estimations:

  • 1M URLs shortened/day = 30M URLs/month
  • 10M redirects/day = 300M redirects/month
  • Read:Write ratio = 10:1
Monthly Infrastructure Costs:

Compute:
- 2 web servers (t3.medium): 2 Ă— $29 = $58
- 1 database server (db.t3.small): $26
- Total compute: $84/month

Storage:
- 30M URLs/month Ă— 500 bytes = 15 GB
- Database storage: 15 GB Ă— $0.115 = $1.73/month
- (Keep for 5 years: 900M URLs Ă— 500 bytes = 450 GB = $52/month)

Bandwidth:
- 300M redirects Ă— 1 KB = 300 GB/month
- Cost: 300 GB Ă— $0.09 = $27/month
- With CDN: 300 GB Ă— $0.04 = $12/month (save $15!)

Cache:
- Redis (cache.t3.small): $25/month
- Cache 10% of hot URLs = 3M URLs = 1.5 GB

Other:
- Load balancer: $16/month
- Total: $84 + $52 + $12 + $25 + $16 = $189/month

Current scale: ~$190/month or $2,280/year
At 10x scale (10M URLs/day): ~$1,500/month
At 100x scale (100M URLs/day): ~$12,000/month + sharding costs

Cost per URL: $190 / 30M = $0.0000063 (very cheap!)

Cost optimizations to mention:

  • “We could reduce compute costs by 40% with Reserved Instances”
  • “Using CDN saves $15/month in bandwidth (pays for itself)”
  • “At 100x scale, we’d implement S3 lifecycle policies to move old URLs to Glacier (5x cheaper storage)”

Why this impresses interviewers:

  • Shows business awareness (not just technical)
  • Demonstrates you’ve built real systems (know actual costs)
  • Helps justify design decisions (“CDN adds cost but reduces server load”)
  • Shows ability to optimize (“Reserved Instances save 40%”)

See estimation-cheatsheet > đź’° Cost Estimation for detailed pricing reference

Red flags:
❌ “I think we’re done”
❌ Not addressing obvious bottlenecks
❌ Ignoring failure scenarios

Good signs:
âś… Proactive identification of issues
âś… Realistic about limitations
âś… Specific monitoring strategy
âś… Cost-aware

Communication Best Practices

Do’s ✅

  1. Think out loud

    • Share your reasoning process
    • “I’m thinking about caching here because…”
  2. Ask questions

    • Clarify ambiguities
    • Engage with interviewer
    • “What’s more important, consistency or availability?”
  3. Be open to feedback

    • Interviewer hints are valuable
    • Adapt your design based on input
  4. Manage time

    • Don’t spend 30 minutes on one component
    • Keep moving forward
  5. Draw clearly

    • Label everything
    • Use consistent notation
    • Make it easy to follow
  6. State assumptions

    • “I’m assuming X because Y”
    • Make implicit thoughts explicit

Don’ts ❌

  1. Don’t stay silent

    • Interviewer can’t help if they don’t know your thinking
  2. Don’t go straight to code

    • This is architecture, not coding interview
  3. Don’t over-engineer

    • Start simple, add complexity as needed
  4. Don’t argue with interviewer

    • They might be testing your collaboration
  5. Don’t say “I don’t know” and stop

    • Say “I don’t know, but here’s my reasoning…”
  6. Don’t fixate on one solution

    • Be flexible, consider alternatives

Common Pitfalls

  1. No requirements gathering

    • Jumping straight to design
    • Fix: Always start with Step 1
  2. Too detailed too early

    • Getting lost in implementation details
    • Fix: High-level first, then drill down
  3. No trade-off discussion

    • Presenting only one solution
    • Fix: Always discuss alternatives and trade-offs
  4. Ignoring scale

    • Designing for 100 users when asked for 100M
    • Fix: Do back-of-envelope calculations
  5. Silent designing

    • Drawing without explanation
    • Fix: Talk while you draw
  6. No failure handling

    • Assuming everything works perfectly
    • Fix: Discuss failure scenarios

Framework Summary (Quick Reference)

Step 1: Understand & Scope (3-10 min)
├─ Ask clarifying questions
├─ Functional requirements
├─ Non-functional requirements
└─ Write down assumptions

Step 2: High-Level Design (10-15 min)
├─ Draw architecture diagram
├─ Back-of-envelope estimation
├─ API design
├─ Data model
└─ Get buy-in

Step 3: Deep Dive (10-25 min)
├─ Ask what to focus on
├─ Drill into 2-3 components
├─ Discuss trade-offs
└─ Consider alternatives

Step 4: Wrap Up (3-5 min)
├─ Identify bottlenecks
├─ Failure scenarios
├─ Monitoring strategy
├─ Cost estimation (shows business awareness!)
└─ Future improvements

Practice Recommendations

Week 1: Apply framework to simple systems

  • URL shortener
  • Pastebin
  • Rate limiter

Week 2: Medium complexity

  • Twitter
  • Instagram feed
  • Chat system

Week 3: Complex systems

  • YouTube
  • Uber
  • Netflix

Daily practice:

  1. Pick a system
  2. Set timer (45 min)
  3. Go through all 4 steps
  4. Review: Did I follow the framework?

With peers:

  • Mock interview each other
  • Give feedback on process, not just solution
  • Practice explaining out loud

Key Takeaways

  1. Process matters more than solution: There’s no one “correct” answer. Interviewers assess your approach.

  2. Start simple, iterate: Don’t try to design the perfect system immediately.

  3. Communicate constantly: This is a conversation, not a lecture.

  4. Time management: Allocate time across all 4 steps.

  5. Trade-offs are key: Every design decision has pros and cons.

  6. Adapt to feedback: Be flexible based on interviewer’s direction.

  7. Think production-ready: Consider failures, monitoring, scale.

External Resources


This is THE most important chapter! Master this framework before moving to specific system designs. Every other chapter applies these 4 steps.

Practice applying the framework daily until it becomes second nature.


Last Updated: 2026-04-08
Status: Essential reading - Review before every interview