Chapter 3: A Framework For System Design Interviews

volume1 framework interview-prep

Status: 🟩 Interview ready (This is THE most important chapter!)
Difficulty: Essential - Must master
Time to complete: 30 min read, practice daily

Overview

This chapter provides the 4-step framework for tackling any system design interview. Master this framework and apply it to every design problem.

Why this matters: Interviewers want to see your problem-solving process, not just the final solution. The framework shows structured thinking.

The 4-Step Framework

Step 1: Understand the Problem and Establish Design Scope (3-10 min)

Goal: Clarify requirements and agree on scope with interviewer.

Why it matters:

Shows you don’t make assumptions
Demonstrates communication skills
Aligns expectations with interviewer

What to do:

Ask clarifying questions
- Don’t jump straight to solutions
- Think out loud
- Engage with interviewer
Define functional requirements
- What features are needed?
- What is the core functionality?
- What can be omitted for MVP?
Define non-functional requirements
- Scale (users, data volume)
- Performance (latency, throughput)
- Availability (uptime SLA)
- Consistency (strong vs eventual)
Write down assumptions
- “Let’s assume 100M DAU”
- “I’ll design for 99.9% availability”
- Get interviewer buy-in

Example Questions (URL Shortener):

“What’s the traffic volume? Millions of URLs per day?”
“How long should shortened URLs last? Forever or expire?”
“Can users customize short URLs?”
“Should we track analytics (click counts)?”
“What’s the availability requirement?”

Red flags:
❌ Jumping straight to design
❌ Staying silent
❌ Assuming too much without asking

Good signs:
✅ Asking thoughtful questions
✅ Writing down requirements
✅ Getting interviewer agreement

Step 2: Propose High-Level Design and Get Buy-In (10-15 min)

Goal: Draw initial architecture and validate approach with interviewer.

Why it matters:

Shows you can think at the right abstraction level
Allows course correction early
Builds rapport with interviewer

What to do:

Start simple, then iterate
- Don’t start with complex architecture
- Begin with basic components
- Add complexity based on requirements
Draw diagrams
- Use boxes and arrows
- Label components clearly
- Show data flow
Do back-of-envelope calculations
- Estimate QPS, storage, bandwidth
- Shows quantitative thinking
- Validates design decisions
Design APIs
- 2-3 core endpoints
- REST or RPC style
- Request/response formats
Define data model
- Key entities
- Relationships
- Storage choice (SQL vs NoSQL)
Get agreement before proceeding
- “Does this high-level design make sense?”
- “Should I dive deeper into any component?”

Example (URL Shortener):

Architecture:
[Client] → [Load Balancer] → [Web Servers] → [Cache] → [Database]
                                                   ↓
                                            [URL Generation Service]

API:
POST /api/v1/shorten
  Request: { "longUrl": "https://example.com/very/long/url" }
  Response: { "shortUrl": "https://short.url/abc123" }

GET /api/v1/{shortUrl}
  Response: Redirect 302 to longUrl

Data Model:
URL (table)
- id (PK)
- shortCode (indexed)
- longUrl
- createdAt
- expiresAt
- userId (FK, optional)

Estimation:
- Write: 100M URLs/day = 1,200 QPS
- Read: 10x ratio = 12,000 QPS
- Storage: 100M × 500 bytes = 50 GB/day
- 5 years: 50GB × 365 × 5 = 91 TB

Red flags:
❌ Too much detail too early
❌ No diagram
❌ Skipping calculations
❌ Not checking in with interviewer

Good signs:
✅ Clear diagram
✅ Quantitative reasoning
✅ Iterative approach
✅ Seeking feedback

Step 3: Design Deep Dive (10-25 min)

Goal: Drill into specific components based on interviewer’s interest.

Why it matters:

Demonstrates technical depth
Shows understanding of trade-offs
Reveals experience with real systems

What to do:

Ask what to focus on
- “Which component should we discuss in detail?”
- Interviewer will guide the direction
Common deep dive topics:
- Scalability: How to handle 10x, 100x growth?
- Performance: How to optimize latency?
- Reliability: How to handle failures?
- Consistency: Strong vs eventual?
- Security: Authentication, authorization, rate limiting
- Data storage: SQL vs NoSQL, sharding, replication
Discuss trade-offs
- Every decision has pros and cons
- Explain why you chose one approach
- Mention alternatives and when they’d be better
Use specific technologies
- Redis for caching
- Kafka for messaging
- Cassandra for wide-column store
- But explain WHY, not just name-dropping

Example Topics (URL Shortener):

Topic 1: URL Shortening Algorithm

Approach 1: Hash function (MD5, SHA-256)
Pros:
- Fast computation
- Distributed (no coordination)
Cons:
- Hash collision possible
- Fixed length (longer than needed)
- Need collision resolution

Approach 2: Base62 encoding with counter
Pros:
- Short codes (7 chars for billions of URLs)
- No collisions
- Predictable length
Cons:
- Need distributed ID generation
- Possible guessing of sequential URLs

Approach 3: Pre-generate codes
Pros:
- Very fast (just read from pool)
- Short codes
Cons:
- Requires separate service
- Memory overhead

Choice: Approach 2 with distributed ID generator (Snowflake-style)
Reason: Balances performance, collision-free, reasonable length

Topic 2: Handling High Read Traffic

Problem: 12K read QPS can overwhelm database

Solution 1: Cache
- Cache hot URLs in Redis/Memcached
- 80-90% hit rate reduces DB load to 1.2K QPS
- TTL for expired URLs
- Cache-aside pattern

Solution 2: CDN
- Cache at edge locations
- Lowest latency for users
- Redirect 301 for permanent URLs

Solution 3: Read replicas
- Separate read and write traffic
- 3 replicas handle 4K QPS each

Combined approach: Cache + CDN + Replicas

Topic 3: Database Choice

SQL (PostgreSQL):
Pros: ACID, structured data, easy queries
Cons: Harder to scale horizontally

NoSQL (Cassandra):
Pros: High write throughput, easy sharding
Cons: Eventual consistency, limited queries

Choice: SQL for MVP, NoSQL for massive scale
Reason: SQL simpler initially, NoSQL when sharding needed

Red flags:
❌ Not discussing trade-offs
❌ Only one solution, no alternatives
❌ Name-dropping without explanation
❌ Going too broad instead of deep

Good signs:
✅ Multiple solutions considered
✅ Clear reasoning for choices
✅ Acknowledges limitations
✅ Uses concrete numbers

Step 4: Wrap Up (3-5 min)

Goal: Show you think beyond the immediate design.

Why it matters:

Demonstrates you understand production systems
Shows forward thinking
Leaves strong final impression

What to discuss:

1. Identify bottlenecks

“At 100K QPS, the database becomes a bottleneck”
“We’d need to shard by userId”

2. System failures

“If cache goes down, implement circuit breaker to database”
“Multi-region setup for disaster recovery”

3. Monitoring and metrics

Key metrics: QPS, latency (p50, p99), error rate, cache hit rate
Alerting: When QPS > 80% capacity, alert on-call
Dashboards: Real-time metrics for operations

4. Next steps and enhancements

“Add analytics for click tracking”
“Implement custom URLs feature”
“A/B testing framework”
“Machine learning for fraud detection”

5. Scale refinement and cost estimation

“At current design, supports 50M URLs/day”
“To reach 1B URLs/day, we’d need database sharding”
“Let me estimate monthly costs…” (see example below)

Cost Estimation Example (URL Shortener):

Based on our earlier estimations:

1M URLs shortened/day = 30M URLs/month
10M redirects/day = 300M redirects/month
Read:Write ratio = 10:1

Monthly Infrastructure Costs:

Compute:
- 2 web servers (t3.medium): 2 × $29 = $58
- 1 database server (db.t3.small): $26
- Total compute: $84/month

Storage:
- 30M URLs/month × 500 bytes = 15 GB
- Database storage: 15 GB × $0.115 = $1.73/month
- (Keep for 5 years: 900M URLs × 500 bytes = 450 GB = $52/month)

Bandwidth:
- 300M redirects × 1 KB = 300 GB/month
- Cost: 300 GB × $0.09 = $27/month
- With CDN: 300 GB × $0.04 = $12/month (save $15!)

Cache:
- Redis (cache.t3.small): $25/month
- Cache 10% of hot URLs = 3M URLs = 1.5 GB

Other:
- Load balancer: $16/month
- Total: $84 + $52 + $12 + $25 + $16 = $189/month

Current scale: ~$190/month or $2,280/year
At 10x scale (10M URLs/day): ~$1,500/month
At 100x scale (100M URLs/day): ~$12,000/month + sharding costs

Cost per URL: $190 / 30M = $0.0000063 (very cheap!)

Cost optimizations to mention:

“We could reduce compute costs by 40% with Reserved Instances”
“Using CDN saves $15/month in bandwidth (pays for itself)”
“At 100x scale, we’d implement S3 lifecycle policies to move old URLs to Glacier (5x cheaper storage)”

Why this impresses interviewers:

Shows business awareness (not just technical)
Demonstrates you’ve built real systems (know actual costs)
Helps justify design decisions (“CDN adds cost but reduces server load”)
Shows ability to optimize (“Reserved Instances save 40%”)

See estimation-cheatsheet > 💰 Cost Estimation for detailed pricing reference

Red flags:
❌ “I think we’re done”
❌ Not addressing obvious bottlenecks
❌ Ignoring failure scenarios

Good signs:
✅ Proactive identification of issues
✅ Realistic about limitations
✅ Specific monitoring strategy
✅ Cost-aware

Communication Best Practices

Do’s ✅

Think out loud
- Share your reasoning process
- “I’m thinking about caching here because…”
Ask questions
- Clarify ambiguities
- Engage with interviewer
- “What’s more important, consistency or availability?”
Be open to feedback
- Interviewer hints are valuable
- Adapt your design based on input
Manage time
- Don’t spend 30 minutes on one component
- Keep moving forward
Draw clearly
- Label everything
- Use consistent notation
- Make it easy to follow
State assumptions
- “I’m assuming X because Y”
- Make implicit thoughts explicit

Don’ts ❌

Don’t stay silent
- Interviewer can’t help if they don’t know your thinking
Don’t go straight to code
- This is architecture, not coding interview
Don’t over-engineer
- Start simple, add complexity as needed
Don’t argue with interviewer
- They might be testing your collaboration
Don’t say “I don’t know” and stop
- Say “I don’t know, but here’s my reasoning…”
Don’t fixate on one solution
- Be flexible, consider alternatives

Common Pitfalls

No requirements gathering
- Jumping straight to design
- Fix: Always start with Step 1
Too detailed too early
- Getting lost in implementation details
- Fix: High-level first, then drill down
No trade-off discussion
- Presenting only one solution
- Fix: Always discuss alternatives and trade-offs
Ignoring scale
- Designing for 100 users when asked for 100M
- Fix: Do back-of-envelope calculations
Silent designing
- Drawing without explanation
- Fix: Talk while you draw
No failure handling
- Assuming everything works perfectly
- Fix: Discuss failure scenarios

Framework Summary (Quick Reference)

Step 1: Understand & Scope (3-10 min)
├─ Ask clarifying questions
├─ Functional requirements
├─ Non-functional requirements
└─ Write down assumptions

Step 2: High-Level Design (10-15 min)
├─ Draw architecture diagram
├─ Back-of-envelope estimation
├─ API design
├─ Data model
└─ Get buy-in

Step 3: Deep Dive (10-25 min)
├─ Ask what to focus on
├─ Drill into 2-3 components
├─ Discuss trade-offs
└─ Consider alternatives

Step 4: Wrap Up (3-5 min)
├─ Identify bottlenecks
├─ Failure scenarios
├─ Monitoring strategy
├─ Cost estimation (shows business awareness!)
└─ Future improvements

Practice Recommendations

Week 1: Apply framework to simple systems

URL shortener
Pastebin
Rate limiter

Week 2: Medium complexity

Twitter
Instagram feed
Chat system

Week 3: Complex systems

YouTube
Uber
Netflix

Daily practice:

Pick a system
Set timer (45 min)
Go through all 4 steps
Review: Did I follow the framework?

With peers:

Mock interview each other
Give feedback on process, not just solution
Practice explaining out loud

Key Takeaways

Process matters more than solution: There’s no one “correct” answer. Interviewers assess your approach.
Start simple, iterate: Don’t try to design the perfect system immediately.
Communicate constantly: This is a conversation, not a lecture.
Time management: Allocate time across all 4 steps.
Trade-offs are key: Every design decision has pros and cons.
Adapt to feedback: Be flexible based on interviewer’s direction.
Think production-ready: Consider failures, monitoring, scale.

Chapter 1 ch01-scale-zero-to-millions: Foundational scaling concepts
Chapter 2 ch02-back-of-envelope-estimation: Estimation techniques
All other chapters: Apply this framework to every design

External Resources

Interview tips: interview-framework (detailed guide)
Estimation help: estimation-cheatsheet (quick reference)
Common patterns: key-patterns (reusable components)

This is THE most important chapter! Master this framework before moving to specific system designs. Every other chapter applies these 4 steps.

Practice applying the framework daily until it becomes second nature.

Last Updated: 2026-04-08
Status: Essential reading - Review before every interview

Study Notes by Niladri & AI

Explorer

ch03-framework-for-system-design

Chapter 3: A Framework For System Design Interviews

Overview

The 4-Step Framework

Step 1: Understand the Problem and Establish Design Scope (3-10 min)

Step 2: Propose High-Level Design and Get Buy-In (10-15 min)

Step 3: Design Deep Dive (10-25 min)

Step 4: Wrap Up (3-5 min)

Communication Best Practices

Do’s ✅

Don’ts ❌

Common Pitfalls

Framework Summary (Quick Reference)

Practice Recommendations

Key Takeaways

External Resources

Graph View

Table of Contents

Backlinks

Study Notes by Niladri & AI

Explorer

ch03-framework-for-system-design

Chapter 3: A Framework For System Design Interviews

Overview

The 4-Step Framework

Step 1: Understand the Problem and Establish Design Scope (3-10 min)

Step 2: Propose High-Level Design and Get Buy-In (10-15 min)

Step 3: Design Deep Dive (10-25 min)

Step 4: Wrap Up (3-5 min)

Communication Best Practices

Do’s ✅

Don’ts ❌

Common Pitfalls

Framework Summary (Quick Reference)

Practice Recommendations

Key Takeaways

Related Chapters

External Resources

Graph View

Table of Contents

Backlinks