Chapter 3: A Framework For System Design Interviews
volume1 framework interview-prep
Status: đźź© Interview ready (This is THE most important chapter!)
Difficulty: Essential - Must master
Time to complete: 30 min read, practice daily
Overview
This chapter provides the 4-step framework for tackling any system design interview. Master this framework and apply it to every design problem.
Why this matters: Interviewers want to see your problem-solving process, not just the final solution. The framework shows structured thinking.
The 4-Step Framework
Step 1: Understand the Problem and Establish Design Scope (3-10 min)
Goal: Clarify requirements and agree on scope with interviewer.
Why it matters:
- Shows you don’t make assumptions
- Demonstrates communication skills
- Aligns expectations with interviewer
What to do:
-
Ask clarifying questions
- Don’t jump straight to solutions
- Think out loud
- Engage with interviewer
-
Define functional requirements
- What features are needed?
- What is the core functionality?
- What can be omitted for MVP?
-
Define non-functional requirements
- Scale (users, data volume)
- Performance (latency, throughput)
- Availability (uptime SLA)
- Consistency (strong vs eventual)
-
Write down assumptions
- “Let’s assume 100M DAU”
- “I’ll design for 99.9% availability”
- Get interviewer buy-in
Example Questions (URL Shortener):
- “What’s the traffic volume? Millions of URLs per day?”
- “How long should shortened URLs last? Forever or expire?”
- “Can users customize short URLs?”
- “Should we track analytics (click counts)?”
- “What’s the availability requirement?”
Red flags:
❌ Jumping straight to design
❌ Staying silent
❌ Assuming too much without asking
Good signs:
âś… Asking thoughtful questions
âś… Writing down requirements
âś… Getting interviewer agreement
Step 2: Propose High-Level Design and Get Buy-In (10-15 min)
Goal: Draw initial architecture and validate approach with interviewer.
Why it matters:
- Shows you can think at the right abstraction level
- Allows course correction early
- Builds rapport with interviewer
What to do:
-
Start simple, then iterate
- Don’t start with complex architecture
- Begin with basic components
- Add complexity based on requirements
-
Draw diagrams
- Use boxes and arrows
- Label components clearly
- Show data flow
-
Do back-of-envelope calculations
- Estimate QPS, storage, bandwidth
- Shows quantitative thinking
- Validates design decisions
-
Design APIs
- 2-3 core endpoints
- REST or RPC style
- Request/response formats
-
Define data model
- Key entities
- Relationships
- Storage choice (SQL vs NoSQL)
-
Get agreement before proceeding
- “Does this high-level design make sense?”
- “Should I dive deeper into any component?”
Example (URL Shortener):
Architecture:
[Client] → [Load Balancer] → [Web Servers] → [Cache] → [Database]
↓
[URL Generation Service]
API:
POST /api/v1/shorten
Request: { "longUrl": "https://example.com/very/long/url" }
Response: { "shortUrl": "https://short.url/abc123" }
GET /api/v1/{shortUrl}
Response: Redirect 302 to longUrl
Data Model:
URL (table)
- id (PK)
- shortCode (indexed)
- longUrl
- createdAt
- expiresAt
- userId (FK, optional)
Estimation:
- Write: 100M URLs/day = 1,200 QPS
- Read: 10x ratio = 12,000 QPS
- Storage: 100M Ă— 500 bytes = 50 GB/day
- 5 years: 50GB Ă— 365 Ă— 5 = 91 TB
Red flags:
❌ Too much detail too early
❌ No diagram
❌ Skipping calculations
❌ Not checking in with interviewer
Good signs:
âś… Clear diagram
âś… Quantitative reasoning
âś… Iterative approach
âś… Seeking feedback
Step 3: Design Deep Dive (10-25 min)
Goal: Drill into specific components based on interviewer’s interest.
Why it matters:
- Demonstrates technical depth
- Shows understanding of trade-offs
- Reveals experience with real systems
What to do:
-
Ask what to focus on
- “Which component should we discuss in detail?”
- Interviewer will guide the direction
-
Common deep dive topics:
- Scalability: How to handle 10x, 100x growth?
- Performance: How to optimize latency?
- Reliability: How to handle failures?
- Consistency: Strong vs eventual?
- Security: Authentication, authorization, rate limiting
- Data storage: SQL vs NoSQL, sharding, replication
-
Discuss trade-offs
- Every decision has pros and cons
- Explain why you chose one approach
- Mention alternatives and when they’d be better
-
Use specific technologies
- Redis for caching
- Kafka for messaging
- Cassandra for wide-column store
- But explain WHY, not just name-dropping
Example Topics (URL Shortener):
Topic 1: URL Shortening Algorithm
Approach 1: Hash function (MD5, SHA-256)
Pros:
- Fast computation
- Distributed (no coordination)
Cons:
- Hash collision possible
- Fixed length (longer than needed)
- Need collision resolution
Approach 2: Base62 encoding with counter
Pros:
- Short codes (7 chars for billions of URLs)
- No collisions
- Predictable length
Cons:
- Need distributed ID generation
- Possible guessing of sequential URLs
Approach 3: Pre-generate codes
Pros:
- Very fast (just read from pool)
- Short codes
Cons:
- Requires separate service
- Memory overhead
Choice: Approach 2 with distributed ID generator (Snowflake-style)
Reason: Balances performance, collision-free, reasonable length
Topic 2: Handling High Read Traffic
Problem: 12K read QPS can overwhelm database
Solution 1: Cache
- Cache hot URLs in Redis/Memcached
- 80-90% hit rate reduces DB load to 1.2K QPS
- TTL for expired URLs
- Cache-aside pattern
Solution 2: CDN
- Cache at edge locations
- Lowest latency for users
- Redirect 301 for permanent URLs
Solution 3: Read replicas
- Separate read and write traffic
- 3 replicas handle 4K QPS each
Combined approach: Cache + CDN + Replicas
Topic 3: Database Choice
SQL (PostgreSQL):
Pros: ACID, structured data, easy queries
Cons: Harder to scale horizontally
NoSQL (Cassandra):
Pros: High write throughput, easy sharding
Cons: Eventual consistency, limited queries
Choice: SQL for MVP, NoSQL for massive scale
Reason: SQL simpler initially, NoSQL when sharding needed
Red flags:
❌ Not discussing trade-offs
❌ Only one solution, no alternatives
❌ Name-dropping without explanation
❌ Going too broad instead of deep
Good signs:
âś… Multiple solutions considered
âś… Clear reasoning for choices
âś… Acknowledges limitations
âś… Uses concrete numbers
Step 4: Wrap Up (3-5 min)
Goal: Show you think beyond the immediate design.
Why it matters:
- Demonstrates you understand production systems
- Shows forward thinking
- Leaves strong final impression
What to discuss:
1. Identify bottlenecks
- “At 100K QPS, the database becomes a bottleneck”
- “We’d need to shard by userId”
2. System failures
- “If cache goes down, implement circuit breaker to database”
- “Multi-region setup for disaster recovery”
3. Monitoring and metrics
- Key metrics: QPS, latency (p50, p99), error rate, cache hit rate
- Alerting: When QPS > 80% capacity, alert on-call
- Dashboards: Real-time metrics for operations
4. Next steps and enhancements
- “Add analytics for click tracking”
- “Implement custom URLs feature”
- “A/B testing framework”
- “Machine learning for fraud detection”
5. Scale refinement and cost estimation
- “At current design, supports 50M URLs/day”
- “To reach 1B URLs/day, we’d need database sharding”
- “Let me estimate monthly costs…” (see example below)
Cost Estimation Example (URL Shortener):
Based on our earlier estimations:
- 1M URLs shortened/day = 30M URLs/month
- 10M redirects/day = 300M redirects/month
- Read:Write ratio = 10:1
Monthly Infrastructure Costs:
Compute:
- 2 web servers (t3.medium): 2 Ă— $29 = $58
- 1 database server (db.t3.small): $26
- Total compute: $84/month
Storage:
- 30M URLs/month Ă— 500 bytes = 15 GB
- Database storage: 15 GB Ă— $0.115 = $1.73/month
- (Keep for 5 years: 900M URLs Ă— 500 bytes = 450 GB = $52/month)
Bandwidth:
- 300M redirects Ă— 1 KB = 300 GB/month
- Cost: 300 GB Ă— $0.09 = $27/month
- With CDN: 300 GB Ă— $0.04 = $12/month (save $15!)
Cache:
- Redis (cache.t3.small): $25/month
- Cache 10% of hot URLs = 3M URLs = 1.5 GB
Other:
- Load balancer: $16/month
- Total: $84 + $52 + $12 + $25 + $16 = $189/month
Current scale: ~$190/month or $2,280/year
At 10x scale (10M URLs/day): ~$1,500/month
At 100x scale (100M URLs/day): ~$12,000/month + sharding costs
Cost per URL: $190 / 30M = $0.0000063 (very cheap!)
Cost optimizations to mention:
- “We could reduce compute costs by 40% with Reserved Instances”
- “Using CDN saves $15/month in bandwidth (pays for itself)”
- “At 100x scale, we’d implement S3 lifecycle policies to move old URLs to Glacier (5x cheaper storage)”
Why this impresses interviewers:
- Shows business awareness (not just technical)
- Demonstrates you’ve built real systems (know actual costs)
- Helps justify design decisions (“CDN adds cost but reduces server load”)
- Shows ability to optimize (“Reserved Instances save 40%”)
See estimation-cheatsheet > đź’° Cost Estimation for detailed pricing reference
Red flags:
❌ “I think we’re done”
❌ Not addressing obvious bottlenecks
❌ Ignoring failure scenarios
Good signs:
âś… Proactive identification of issues
âś… Realistic about limitations
âś… Specific monitoring strategy
âś… Cost-aware
Communication Best Practices
Do’s ✅
-
Think out loud
- Share your reasoning process
- “I’m thinking about caching here because…”
-
Ask questions
- Clarify ambiguities
- Engage with interviewer
- “What’s more important, consistency or availability?”
-
Be open to feedback
- Interviewer hints are valuable
- Adapt your design based on input
-
Manage time
- Don’t spend 30 minutes on one component
- Keep moving forward
-
Draw clearly
- Label everything
- Use consistent notation
- Make it easy to follow
-
State assumptions
- “I’m assuming X because Y”
- Make implicit thoughts explicit
Don’ts ❌
-
Don’t stay silent
- Interviewer can’t help if they don’t know your thinking
-
Don’t go straight to code
- This is architecture, not coding interview
-
Don’t over-engineer
- Start simple, add complexity as needed
-
Don’t argue with interviewer
- They might be testing your collaboration
-
Don’t say “I don’t know” and stop
- Say “I don’t know, but here’s my reasoning…”
-
Don’t fixate on one solution
- Be flexible, consider alternatives
Common Pitfalls
-
No requirements gathering
- Jumping straight to design
- Fix: Always start with Step 1
-
Too detailed too early
- Getting lost in implementation details
- Fix: High-level first, then drill down
-
No trade-off discussion
- Presenting only one solution
- Fix: Always discuss alternatives and trade-offs
-
Ignoring scale
- Designing for 100 users when asked for 100M
- Fix: Do back-of-envelope calculations
-
Silent designing
- Drawing without explanation
- Fix: Talk while you draw
-
No failure handling
- Assuming everything works perfectly
- Fix: Discuss failure scenarios
Framework Summary (Quick Reference)
Step 1: Understand & Scope (3-10 min)
├─ Ask clarifying questions
├─ Functional requirements
├─ Non-functional requirements
└─ Write down assumptions
Step 2: High-Level Design (10-15 min)
├─ Draw architecture diagram
├─ Back-of-envelope estimation
├─ API design
├─ Data model
└─ Get buy-in
Step 3: Deep Dive (10-25 min)
├─ Ask what to focus on
├─ Drill into 2-3 components
├─ Discuss trade-offs
└─ Consider alternatives
Step 4: Wrap Up (3-5 min)
├─ Identify bottlenecks
├─ Failure scenarios
├─ Monitoring strategy
├─ Cost estimation (shows business awareness!)
└─ Future improvements
Practice Recommendations
Week 1: Apply framework to simple systems
- URL shortener
- Pastebin
- Rate limiter
Week 2: Medium complexity
- Instagram feed
- Chat system
Week 3: Complex systems
- YouTube
- Uber
- Netflix
Daily practice:
- Pick a system
- Set timer (45 min)
- Go through all 4 steps
- Review: Did I follow the framework?
With peers:
- Mock interview each other
- Give feedback on process, not just solution
- Practice explaining out loud
Key Takeaways
-
Process matters more than solution: There’s no one “correct” answer. Interviewers assess your approach.
-
Start simple, iterate: Don’t try to design the perfect system immediately.
-
Communicate constantly: This is a conversation, not a lecture.
-
Time management: Allocate time across all 4 steps.
-
Trade-offs are key: Every design decision has pros and cons.
-
Adapt to feedback: Be flexible based on interviewer’s direction.
-
Think production-ready: Consider failures, monitoring, scale.
Related Chapters
- Chapter 1 ch01-scale-zero-to-millions: Foundational scaling concepts
- Chapter 2 ch02-back-of-envelope-estimation: Estimation techniques
- All other chapters: Apply this framework to every design
External Resources
- Interview tips: interview-framework (detailed guide)
- Estimation help: estimation-cheatsheet (quick reference)
- Common patterns: key-patterns (reusable components)
This is THE most important chapter! Master this framework before moving to specific system designs. Every other chapter applies these 4 steps.
Practice applying the framework daily until it becomes second nature.
Last Updated: 2026-04-08
Status: Essential reading - Review before every interview