Chapter 2: Back-of-Envelope Estimation
volume1 estimation fundamentals interview-critical
Status: 🟩 Essential - Must master
Difficulty: Math required (but simple!)
Time to complete: 30 min read + practice
Overview
Back-of-envelope estimation is a critical interview skill. It shows you can think quantitatively about systems and make informed design decisions based on scale.
Why this matters:
- Validates design decisions (“Do we need caching? Let’s calculate…”)
- Shows you understand scale
- Helps identify bottlenecks before building
- Impresses interviewers (many candidates skip this!)
Always do calculations in interviews! Even rough estimates are better than none.
Power of Two
Memorize this table:
| Power | Exact Value | Approximate | Bytes | Name | Example |
|---|---|---|---|---|---|
| 10 | 1,024 | 1 thousand | 1 KB | Kilobyte | Small text file |
| 20 | 1,048,576 | 1 million | 1 MB | Megabyte | Small photo |
| 30 | 1,073,741,824 | 1 billion | 1 GB | Gigabyte | HD movie |
| 40 | ~1.1 trillion | 1 trillion | 1 TB | Terabyte | Company database |
| 50 | ~1.1 quadrillion | 1 quadrillion | 1 PB | Petabyte | Google/Facebook scale |
Quick tip: For interviews, use 1000 instead of 1024. Close enough and easier math!
Key conversions:
- 1 byte = 8 bits
- 1 KB = 1,000 bytes ≈ 10
- 1 MB = 1,000 KB ≈ 10^6 bytes (million)
- 1 GB = 1,000 MB ≈ 10^9 bytes (billion)
- 1 TB = 1,000 GB ≈ 10^12 bytes (trillion)
Latency Numbers Every Programmer Should Know
Memorize these (updated for 2024):
| Operation | Latency | Comparison | Notes |
|---|---|---|---|
| L1 cache reference | 0.5 ns | - | On-CPU cache |
| L2 cache reference | 7 ns | 14x L1 | |
| Main memory reference | 100 ns | ~200x L1 | RAM access |
| Compress 1KB with Snappy | 10 μs | 10,000 ns | |
| Send 1KB over 1 Gbps network | 10 μs | 10,000 ns | |
| Read 1 MB sequentially from memory | 250 μs | 250,000 ns | |
| Round trip within datacenter | 0.5 ms | 500,000 ns | |
| Read 1 MB sequentially from SSD | 1 ms | 4x memory | |
| Disk seek | 10 ms | 20x SSD | Mechanical disk |
| Read 1 MB sequentially from disk | 30 ms | 30x SSD | |
| Send packet CA → Netherlands → CA | 150 ms | 300x datacenter RTT | Transcontinental |
Key takeaways:
- Memory is fast: 100 ns
- SSD is okay: 1 ms for 1 MB
- Disk is slow: 10-30 ms
- Network within datacenter: ~0.5 ms
- Cross-continent: ~150 ms
Mnemonic: “Memory, SSD, Disk - Slow, Slower, Slowest”
- Memory: 100 ns
- SSD: 1 ms = 10,000x slower
- Disk: 10 ms = 100,000x slower
Why this matters:
- Tells you cache > database > disk
- Network calls are expensive (batch them)
- Reading 1MB from memory 4x faster than SSD
Availability Numbers
The “Nines” of Availability:
| Availability % | Downtime/Day | Downtime/Year | Name |
|---|---|---|---|
| 90% (one nine) | 2.4 hours | 36.5 days | Not acceptable |
| 99% (two nines) | 14.4 minutes | 3.65 days | Basic |
| 99.9% (three nines) | 1.4 minutes | 8.76 hours | Standard |
| 99.99% (four nines) | 8.6 seconds | 52.6 minutes | High availability |
| 99.999% (five nines) | 864 ms | 5.26 minutes | Mission critical |
Quick formula:
Downtime/year = (1 - availability) × 365 days × 24 hours
Example:
99.9% availability:
= (1 - 0.999) × 365 × 24
= 0.001 × 8,760 hours
= 8.76 hours/year
In interviews:
- Most web services: 99.9% - 99.99%
- Financial systems: 99.99% - 99.999%
- Internal tools: 99% is often OK
Cost vs Availability: Each additional “nine” is exponentially more expensive!
Estimation Process (Step-by-Step)
Step 1: Understand the Problem
Ask clarifying questions:
- How many users?
- What’s the usage pattern? (requests per user per day)
- Read-heavy or write-heavy?
- Any specific requirements?
Don’t assume! Get these numbers from interviewer.
Step 2: Write Down Assumptions
Always write down assumptions clearly:
- “Assume 100 million DAU”
- “Assume each user posts 2 tweets per day”
- “Assume 10:1 read/write ratio”
- “Assume each tweet is 280 characters = ~300 bytes”
Tip: Round numbers for easier math!
Step 3: Do the Math
Show your work! Interviewers want to see your thought process.
Common calculations:
- Traffic estimation (QPS)
- Storage estimation
- Bandwidth estimation
- Memory/cache estimation
Let’s walk through each…
Traffic Estimation (QPS)
Formula:
QPS = (Total operations per day) / (Seconds per day)
Seconds per day:
- Exact: 86,400 seconds
- For interviews: Use 100,000 (easier math, close enough!)
Example 1: Twitter Write Traffic
Given:
- 300 million DAU (Daily Active Users)
- Average user posts 2 tweets per day
Step 1: Total tweets per day
= 300M users × 2 tweets
= 600M tweets/day
Step 2: Write QPS (average)
= 600M / 100K seconds
= 6,000 tweets/second
Step 3: Peak QPS (2x average)
= 6,000 × 2
= 12,000 tweets/second at peak
Example 2: Twitter Read Traffic
Given:
- 300M DAU
- Each user views timeline 10 times per day
- Each timeline view loads 20 tweets
Step 1: Timeline views per day
= 300M users × 10 views
= 3 billion views/day
Step 2: Read QPS (average)
= 3B / 100K
= 30,000 QPS
Step 3: Peak QPS
= 30,000 × 2
= 60,000 QPS at peak
Key insight: Twitter is read-heavy (60K reads vs 12K writes = 5:1 ratio)
Storage Estimation
Formula:
Storage = (Number of items) × (Size per item) × (Time period)
Example 1: Twitter Storage
Given:
- 600M tweets/day (from above)
- Each tweet = 280 characters = ~300 bytes
- Also need metadata (userId, timestamp, etc.) = ~200 bytes
- Total per tweet = 500 bytes
Step 1: Daily storage (text only)
= 600M × 500 bytes
= 300 GB/day
Step 2: With media (assume 10% have images, 5 MB avg)
= 600M × 0.1 × 5 MB
= 300M MB
= 300 TB/day
Step 3: Yearly storage
= 300 TB × 365
= 109,500 TB
≈ 110 PB/year
Step 4: With replication (3x for redundancy)
= 110 PB × 3
= 330 PB/year
Example 2: YouTube Storage
Given:
- 500M video uploads/day
- Average video size = 100 MB
Step 1: Daily storage
= 500M × 100 MB
= 50,000,000,000 MB
= 50,000 TB
= 50 PB/day (!!)
Step 2: Yearly storage
= 50 PB × 365
= 18,250 PB/year
≈ 18 EB (exabytes) per year
Step 3: With different resolutions (360p, 720p, 1080p, 4K)
= 50 PB × 4 formats
= 200 PB/day
= 73,000 PB/year
≈ 73 EB/year
Key insight: Video is HUGE! This is why CDN and compression are critical.
Bandwidth Estimation
Formula:
Bandwidth = Data Size / Time
Convert to MB/s or GB/s for easier understanding
Example: Twitter Bandwidth
Upload (Write):
- 300 TB/day (text + media from storage calc)
- Bandwidth = 300 TB / 86,400 seconds
= 300 × 10^12 bytes / 86,400
≈ 3.5 GB/second upload
Download (Read, 5:1 ratio):
- Bandwidth = 3.5 GB/s × 5
= 17.5 GB/second download
Network capacity check:
- 1 Gbps = 125 MB/s
- 10 Gbps = 1.25 GB/s
- 100 Gbps = 12.5 GB/s
For 17.5 GB/s, need multiple 100 Gbps connections or CDN!
Memory/Cache Estimation
Follow 80/20 rule: Cache the 20% most accessed data (gives 80% hit rate)
Example: Twitter Cache
Given:
- 60K read QPS (from earlier)
- Want to cache hot tweets
- Assume 20% of requests are for same tweets
Step 1: Requests to cache per day
= 60K QPS × 86,400 seconds
= 5.2 billion requests/day
Step 2: Unique tweets (20% are duplicates, so 80% unique)
= 5.2B × 0.8
= 4.2 billion unique tweets/day
Step 3: Cache size (keep recent tweets, 1 day)
= 4.2B × 500 bytes (per tweet)
= 2.1 TB
Step 4: Practical cache (keep hottest 10%)
= 2.1 TB × 0.1
= 210 GB
Conclusion: Need ~200-300 GB Redis cache
Cache size rule of thumb: 10-20% of daily active data
Putting It All Together: Complete Example
Problem: Estimate requirements for Instagram
Given:
- 500 million DAU
- Each user uploads 2 photos/day
- Each photo = 2 MB
- Each user views 50 photos/day
- Need to store for 5 years
Step 1: Traffic
Uploads (Write):
= 500M users × 2 photos
= 1 billion photos/day
= 1B / 100K seconds
= 10K writes/second
Peak write QPS = 10K × 2 = 20K QPS
Views (Read):
= 500M users × 50 views
= 25 billion views/day
= 25B / 100K
= 250K reads/second
Peak read QPS = 250K × 2 = 500K QPS
Read:Write ratio = 250K / 10K = 25:1 (very read-heavy!)
Step 2: Storage
Daily storage:
= 1B photos × 2 MB
= 2,000,000,000 MB
= 2 million GB
= 2,000 TB
= 2 PB/day
Yearly storage:
= 2 PB × 365
= 730 PB/year
5-year storage:
= 730 PB × 5
= 3,650 PB
≈ 3.7 EB (exabytes)
With replication (3x):
= 3.7 EB × 3
= 11 EB total
Step 3: Bandwidth
Upload:
= 2 PB/day
= 2 × 10^15 bytes / 86,400 sec
≈ 23 GB/second upload
Download (25:1 ratio):
= 23 GB/s × 25
= 575 GB/second download
Conclusion: Need CDN! Can't serve 575 GB/s from origin servers.
Step 4: Cache
Cache hot photos (recent + popular):
= 250K requests/sec
= Assume 20% of photos get 80% of traffic
= Cache top 5% of daily photos (aggressive caching)
Photos to cache:
= 1B photos/day × 0.05
= 50M photos
Cache size:
= 50M × 2 MB
= 100,000,000 MB
= 100 TB cache
Practical: Distribute across 100 servers with 1 TB RAM each
Summary
| Metric | Value |
|---|---|
| Write QPS | 10K (peak 20K) |
| Read QPS | 250K (peak 500K) |
| Storage/day | 2 PB |
| Storage/5-year | 11 EB (with replication) |
| Upload bandwidth | 23 GB/s |
| Download bandwidth | 575 GB/s |
| Cache size | 100 TB (100 servers) |
Design implications:
- Need CDN (575 GB/s download!)
- Cache heavily (25:1 read ratio)
- Shard database (10K write QPS on one DB is tough)
- Object storage (S3) for photos, not database
Common Estimation Patterns
Pattern 1: DAU → QPS
Template:
1. DAU × actions per user = total actions/day
2. Total actions / 100K seconds = QPS
3. QPS × 2 or 3 = peak QPS
Pattern 2: Social Media
Users post content → Calculate posts/day → Storage
Users view content → Calculate views/day → QPS & Bandwidth
Read:Write ratio typically 10:1 to 100:1
Pattern 3: Video/Streaming
Uploads are large (GB) → Massive storage
Views even larger → CDN mandatory
Different resolutions → Multiply storage by 3-5x
Pattern 4: Messaging
Each message small (few KB) → Storage manageable
But messages LOTS of them → High QPS
Need message queue for reliability
Tips & Tricks
1. Use round numbers:
- 100M users (not 97.3M)
- 100K seconds/day (not 86,400)
- 1 year = 365 days (ignore leap year)
2. Use scientific notation:
- 1,000,000,000 → 10^9 or “1 billion”
- Makes math easier
3. Show your work:
Bad: "We need 500 TB"
Good: "100M users × 2 photos/day × 2 MB × 365 days = 146 PB/year"
4. State assumptions clearly:
- “Assuming 10:1 read/write ratio”
- “Assuming 80% cache hit rate”
- “Assuming 3x replication”
5. Sanity check your answer:
- Does 10 PB/day make sense?
- Is 1M QPS realistic?
- Compare to known systems
6. It’s OK to be approximate:
- “Between 100K and 200K QPS” is fine
- Ballpark is enough for interviews
Red Flags to Avoid
❌ Skipping estimation - “Let’s just design it”
❌ Being too precise - 86,431.2 seconds
❌ No assumptions - How can you calculate without input?
❌ Forgetting peak traffic - Always mention 2-3x average
❌ Not showing work - “Storage is 1 PB” (how did you get that?)
Green Flags Interviewers Love
✅ Doing calculation unprompted - Shows initiative
✅ Writing down assumptions - Shows clarity of thought
✅ Showing work step-by-step - Easy to follow
✅ Discussing implications - “This means we need CDN”
✅ Sanity checking - “Does this number make sense?”
Practice Problems
Problem 1: Design URL Shortener
- 100M URLs shortened/day
- Each URL: 500 bytes
- Read:Write = 100:1
- Keep URLs for 5 years
Calculate: Write QPS, Read QPS, Storage, Bandwidth
Problem 2: Design WhatsApp
- 2B users
- Each user sends 50 messages/day
- Each message: 100 bytes
- Read = 2× Write (send + receive)
Calculate: Write QPS, Read QPS, Daily storage
Problem 3: Design Dropbox
- 500M users
- Each user uploads 100 MB/day
- Each user downloads 500 MB/day (sharing)
Calculate: Upload bandwidth, Download bandwidth, Storage/year
Key Takeaways
- Always do back-of-envelope - Even rough estimates > no estimates
- Memorize key numbers - Power of 2, latencies, time conversions
- Use 100K seconds/day - Makes math easy in interviews
- Show your work - Process matters more than exact answer
- State assumptions - Get interviewer buy-in
- Discuss implications - “This means we need caching”
- Round numbers - 100M is better than 97.3M
- Peak traffic is 2-3x - Don’t forget to mention it!
Related Resources
- estimation-cheatsheet: Quick reference for numbers
- ch01-scale-from-zero-to-millions: When to add components based on scale
- ch03-framework-for-system-design: Step 2 includes estimation
This is a critical skill! Practice 2-3 calculations daily for different systems until it becomes automatic.
Before every interview: Review latency numbers and time conversions.
Last Updated: 2026-04-08
Status: Essential - Practice daily