Chapter 2: Back-of-Envelope Estimation

volume1 estimation fundamentals interview-critical

Status: 🟩 Essential - Must master
Difficulty: Math required (but simple!)
Time to complete: 30 min read + practice

Overview

Back-of-envelope estimation is a critical interview skill. It shows you can think quantitatively about systems and make informed design decisions based on scale.

Why this matters:

Validates design decisions (“Do we need caching? Let’s calculate…”)
Shows you understand scale
Helps identify bottlenecks before building
Impresses interviewers (many candidates skip this!)

Always do calculations in interviews! Even rough estimates are better than none.

Power of Two

Memorize this table:

Power	Exact Value	Approximate	Bytes	Name	Example
10	1,024	1 thousand	1 KB	Kilobyte	Small text file
20	1,048,576	1 million	1 MB	Megabyte	Small photo
30	1,073,741,824	1 billion	1 GB	Gigabyte	HD movie
40	~1.1 trillion	1 trillion	1 TB	Terabyte	Company database
50	~1.1 quadrillion	1 quadrillion	1 PB	Petabyte	Google/Facebook scale

Quick tip: For interviews, use 1000 instead of 1024. Close enough and easier math!

Key conversions:

1 byte = 8 bits
1 KB = 1,000 bytes ≈ 10
1 MB = 1,000 KB ≈ 10^6 bytes (million)
1 GB = 1,000 MB ≈ 10^9 bytes (billion)
1 TB = 1,000 GB ≈ 10^12 bytes (trillion)

Latency Numbers Every Programmer Should Know

Memorize these (updated for 2024):

Operation	Latency	Comparison	Notes
L1 cache reference	0.5 ns	-	On-CPU cache
L2 cache reference	7 ns	14x L1
Main memory reference	100 ns	~200x L1	RAM access
Compress 1KB with Snappy	10 μs		10,000 ns
Send 1KB over 1 Gbps network	10 μs		10,000 ns
Read 1 MB sequentially from memory	250 μs		250,000 ns
Round trip within datacenter	0.5 ms		500,000 ns
Read 1 MB sequentially from SSD	1 ms	4x memory
Disk seek	10 ms	20x SSD	Mechanical disk
Read 1 MB sequentially from disk	30 ms	30x SSD
Send packet CA → Netherlands → CA	150 ms	300x datacenter RTT	Transcontinental

Key takeaways:

Memory is fast: 100 ns
SSD is okay: 1 ms for 1 MB
Disk is slow: 10-30 ms
Network within datacenter: ~0.5 ms
Cross-continent: ~150 ms

Mnemonic: “Memory, SSD, Disk - Slow, Slower, Slowest”

Memory: 100 ns
SSD: 1 ms = 10,000x slower
Disk: 10 ms = 100,000x slower

Why this matters:

Tells you cache > database > disk
Network calls are expensive (batch them)
Reading 1MB from memory 4x faster than SSD

Availability Numbers

The “Nines” of Availability:

Availability %	Downtime/Day	Downtime/Year	Name
90% (one nine)	2.4 hours	36.5 days	Not acceptable
99% (two nines)	14.4 minutes	3.65 days	Basic
99.9% (three nines)	1.4 minutes	8.76 hours	Standard
99.99% (four nines)	8.6 seconds	52.6 minutes	High availability
99.999% (five nines)	864 ms	5.26 minutes	Mission critical

Quick formula:

Downtime/year = (1 - availability) × 365 days × 24 hours

Example:

99.9% availability:
= (1 - 0.999) × 365 × 24
= 0.001 × 8,760 hours
= 8.76 hours/year

In interviews:

Most web services: 99.9% - 99.99%
Financial systems: 99.99% - 99.999%
Internal tools: 99% is often OK

Cost vs Availability: Each additional “nine” is exponentially more expensive!

Estimation Process (Step-by-Step)

Step 1: Understand the Problem

Ask clarifying questions:

How many users?
What’s the usage pattern? (requests per user per day)
Read-heavy or write-heavy?
Any specific requirements?

Don’t assume! Get these numbers from interviewer.

Step 2: Write Down Assumptions

Always write down assumptions clearly:

“Assume 100 million DAU”
“Assume each user posts 2 tweets per day”
“Assume 10:1 read/write ratio”
“Assume each tweet is 280 characters = ~300 bytes”

Tip: Round numbers for easier math!

Step 3: Do the Math

Show your work! Interviewers want to see your thought process.

Common calculations:

Traffic estimation (QPS)
Storage estimation
Bandwidth estimation
Memory/cache estimation

Let’s walk through each…

Traffic Estimation (QPS)

Formula:

QPS = (Total operations per day) / (Seconds per day)

Seconds per day:

Exact: 86,400 seconds
For interviews: Use 100,000 (easier math, close enough!)

Example 1: Twitter Write Traffic

Given:
- 300 million DAU (Daily Active Users)
- Average user posts 2 tweets per day

Step 1: Total tweets per day
= 300M users × 2 tweets
= 600M tweets/day

Step 2: Write QPS (average)
= 600M / 100K seconds
= 6,000 tweets/second

Step 3: Peak QPS (2x average)
= 6,000 × 2
= 12,000 tweets/second at peak

Example 2: Twitter Read Traffic

Given:
- 300M DAU
- Each user views timeline 10 times per day
- Each timeline view loads 20 tweets

Step 1: Timeline views per day
= 300M users × 10 views
= 3 billion views/day

Step 2: Read QPS (average)
= 3B / 100K
= 30,000 QPS

Step 3: Peak QPS
= 30,000 × 2
= 60,000 QPS at peak

Key insight: Twitter is read-heavy (60K reads vs 12K writes = 5:1 ratio)

Storage Estimation

Formula:

Storage = (Number of items) × (Size per item) × (Time period)

Example 1: Twitter Storage

Given:
- 600M tweets/day (from above)
- Each tweet = 280 characters = ~300 bytes
- Also need metadata (userId, timestamp, etc.) = ~200 bytes
- Total per tweet = 500 bytes

Step 1: Daily storage (text only)
= 600M × 500 bytes
= 300 GB/day

Step 2: With media (assume 10% have images, 5 MB avg)
= 600M × 0.1 × 5 MB
= 300M MB
= 300 TB/day

Step 3: Yearly storage
= 300 TB × 365
= 109,500 TB
≈ 110 PB/year

Step 4: With replication (3x for redundancy)
= 110 PB × 3
= 330 PB/year

Example 2: YouTube Storage

Given:
- 500M video uploads/day
- Average video size = 100 MB

Step 1: Daily storage
= 500M × 100 MB
= 50,000,000,000 MB
= 50,000 TB
= 50 PB/day (!!)

Step 2: Yearly storage
= 50 PB × 365
= 18,250 PB/year
≈ 18 EB (exabytes) per year

Step 3: With different resolutions (360p, 720p, 1080p, 4K)
= 50 PB × 4 formats
= 200 PB/day
= 73,000 PB/year
≈ 73 EB/year

Key insight: Video is HUGE! This is why CDN and compression are critical.

Bandwidth Estimation

Formula:

Bandwidth = Data Size / Time

Convert to MB/s or GB/s for easier understanding

Example: Twitter Bandwidth

Upload (Write):
- 300 TB/day (text + media from storage calc)
- Bandwidth = 300 TB / 86,400 seconds
            = 300 × 10^12 bytes / 86,400
            ≈ 3.5 GB/second upload

Download (Read, 5:1 ratio):
- Bandwidth = 3.5 GB/s × 5
            = 17.5 GB/second download

Network capacity check:

1 Gbps = 125 MB/s
10 Gbps = 1.25 GB/s
100 Gbps = 12.5 GB/s

For 17.5 GB/s, need multiple 100 Gbps connections or CDN!

Memory/Cache Estimation

Follow 80/20 rule: Cache the 20% most accessed data (gives 80% hit rate)

Example: Twitter Cache

Given:
- 60K read QPS (from earlier)
- Want to cache hot tweets
- Assume 20% of requests are for same tweets

Step 1: Requests to cache per day
= 60K QPS × 86,400 seconds
= 5.2 billion requests/day

Step 2: Unique tweets (20% are duplicates, so 80% unique)
= 5.2B × 0.8
= 4.2 billion unique tweets/day

Step 3: Cache size (keep recent tweets, 1 day)
= 4.2B × 500 bytes (per tweet)
= 2.1 TB

Step 4: Practical cache (keep hottest 10%)
= 2.1 TB × 0.1
= 210 GB

Conclusion: Need ~200-300 GB Redis cache

Cache size rule of thumb: 10-20% of daily active data

Putting It All Together: Complete Example

Problem: Estimate requirements for Instagram

Given:

500 million DAU
Each user uploads 2 photos/day
Each photo = 2 MB
Each user views 50 photos/day
Need to store for 5 years

Step 1: Traffic

Uploads (Write):
= 500M users × 2 photos
= 1 billion photos/day
= 1B / 100K seconds
= 10K writes/second

Peak write QPS = 10K × 2 = 20K QPS

Views (Read):
= 500M users × 50 views
= 25 billion views/day
= 25B / 100K
= 250K reads/second

Peak read QPS = 250K × 2 = 500K QPS

Read:Write ratio = 250K / 10K = 25:1 (very read-heavy!)

Step 2: Storage

Daily storage:
= 1B photos × 2 MB
= 2,000,000,000 MB
= 2 million GB
= 2,000 TB
= 2 PB/day

Yearly storage:
= 2 PB × 365
= 730 PB/year

5-year storage:
= 730 PB × 5
= 3,650 PB
≈ 3.7 EB (exabytes)

With replication (3x):
= 3.7 EB × 3
= 11 EB total

Step 3: Bandwidth

Upload:
= 2 PB/day
= 2 × 10^15 bytes / 86,400 sec
≈ 23 GB/second upload

Download (25:1 ratio):
= 23 GB/s × 25
= 575 GB/second download

Conclusion: Need CDN! Can't serve 575 GB/s from origin servers.

Step 4: Cache

Cache hot photos (recent + popular):
= 250K requests/sec
= Assume 20% of photos get 80% of traffic
= Cache top 5% of daily photos (aggressive caching)

Photos to cache:
= 1B photos/day × 0.05
= 50M photos

Cache size:
= 50M × 2 MB
= 100,000,000 MB
= 100 TB cache

Practical: Distribute across 100 servers with 1 TB RAM each

Summary

Metric	Value
Write QPS	10K (peak 20K)
Read QPS	250K (peak 500K)
Storage/day	2 PB
Storage/5-year	11 EB (with replication)
Upload bandwidth	23 GB/s
Download bandwidth	575 GB/s
Cache size	100 TB (100 servers)

Design implications:

Need CDN (575 GB/s download!)
Cache heavily (25:1 read ratio)
Shard database (10K write QPS on one DB is tough)
Object storage (S3) for photos, not database

Common Estimation Patterns

Pattern 1: DAU → QPS

Template:
1. DAU × actions per user = total actions/day
2. Total actions / 100K seconds = QPS
3. QPS × 2 or 3 = peak QPS

Users post content → Calculate posts/day → Storage
Users view content → Calculate views/day → QPS & Bandwidth
Read:Write ratio typically 10:1 to 100:1

Pattern 3: Video/Streaming

Uploads are large (GB) → Massive storage
Views even larger → CDN mandatory
Different resolutions → Multiply storage by 3-5x

Pattern 4: Messaging

Each message small (few KB) → Storage manageable
But messages LOTS of them → High QPS
Need message queue for reliability

Tips & Tricks

1. Use round numbers:

100M users (not 97.3M)
100K seconds/day (not 86,400)
1 year = 365 days (ignore leap year)

2. Use scientific notation:

1,000,000,000 → 10^9 or “1 billion”
Makes math easier

3. Show your work:

Bad: "We need 500 TB"
Good: "100M users × 2 photos/day × 2 MB × 365 days = 146 PB/year"

4. State assumptions clearly:

“Assuming 10:1 read/write ratio”
“Assuming 80% cache hit rate”
“Assuming 3x replication”

5. Sanity check your answer:

Does 10 PB/day make sense?
Is 1M QPS realistic?
Compare to known systems

6. It’s OK to be approximate:

“Between 100K and 200K QPS” is fine
Ballpark is enough for interviews

Red Flags to Avoid

❌ Skipping estimation - “Let’s just design it”
❌ Being too precise - 86,431.2 seconds
❌ No assumptions - How can you calculate without input?
❌ Forgetting peak traffic - Always mention 2-3x average
❌ Not showing work - “Storage is 1 PB” (how did you get that?)

Green Flags Interviewers Love

✅ Doing calculation unprompted - Shows initiative
✅ Writing down assumptions - Shows clarity of thought
✅ Showing work step-by-step - Easy to follow
✅ Discussing implications - “This means we need CDN”
✅ Sanity checking - “Does this number make sense?”

Practice Problems

Problem 1: Design URL Shortener

100M URLs shortened/day
Each URL: 500 bytes
Read:Write = 100:1
Keep URLs for 5 years

Calculate: Write QPS, Read QPS, Storage, Bandwidth

Problem 2: Design WhatsApp

2B users
Each user sends 50 messages/day
Each message: 100 bytes
Read = 2× Write (send + receive)

Calculate: Write QPS, Read QPS, Daily storage

Problem 3: Design Dropbox

500M users
Each user uploads 100 MB/day
Each user downloads 500 MB/day (sharing)

Calculate: Upload bandwidth, Download bandwidth, Storage/year

Key Takeaways

Always do back-of-envelope - Even rough estimates > no estimates
Memorize key numbers - Power of 2, latencies, time conversions
Use 100K seconds/day - Makes math easy in interviews
Show your work - Process matters more than exact answer
State assumptions - Get interviewer buy-in
Discuss implications - “This means we need caching”
Round numbers - 100M is better than 97.3M
Peak traffic is 2-3x - Don’t forget to mention it!

estimation-cheatsheet: Quick reference for numbers
ch01-scale-from-zero-to-millions: When to add components based on scale
ch03-framework-for-system-design: Step 2 includes estimation

This is a critical skill! Practice 2-3 calculations daily for different systems until it becomes automatic.

Before every interview: Review latency numbers and time conversions.

Last Updated: 2026-04-08
Status: Essential - Practice daily

Study Notes by Niladri & AI

Explorer

ch02-back-of-envelope-estimation

Chapter 2: Back-of-Envelope Estimation

Overview

Power of Two

Latency Numbers Every Programmer Should Know

Availability Numbers

Estimation Process (Step-by-Step)

Step 1: Understand the Problem

Step 2: Write Down Assumptions

Step 3: Do the Math

Traffic Estimation (QPS)

Storage Estimation

Bandwidth Estimation

Memory/Cache Estimation

Putting It All Together: Complete Example

Step 1: Traffic

Step 2: Storage

Step 3: Bandwidth

Step 4: Cache

Summary

Common Estimation Patterns

Pattern 1: DAU → QPS

Pattern 3: Video/Streaming

Pattern 4: Messaging

Tips & Tricks

Red Flags to Avoid

Green Flags Interviewers Love

Practice Problems

Key Takeaways

Graph View

Table of Contents

Backlinks

Study Notes by Niladri & AI

Explorer

ch02-back-of-envelope-estimation

Chapter 2: Back-of-Envelope Estimation

Overview

Power of Two

Latency Numbers Every Programmer Should Know

Availability Numbers

Estimation Process (Step-by-Step)

Step 1: Understand the Problem

Step 2: Write Down Assumptions

Step 3: Do the Math

Traffic Estimation (QPS)

Storage Estimation

Bandwidth Estimation

Memory/Cache Estimation

Putting It All Together: Complete Example

Step 1: Traffic

Step 2: Storage

Step 3: Bandwidth

Step 4: Cache

Summary

Common Estimation Patterns

Pattern 1: DAU → QPS

Pattern 2: Social Media

Pattern 3: Video/Streaming

Pattern 4: Messaging

Tips & Tricks

Red Flags to Avoid

Green Flags Interviewers Love

Practice Problems

Key Takeaways

Related Resources

Graph View

Table of Contents

Backlinks