Chapter 2: Back-of-Envelope Estimation

volume1 estimation fundamentals interview-critical

Status: 🟩 Essential - Must master
Difficulty: Math required (but simple!)
Time to complete: 30 min read + practice


Overview

Back-of-envelope estimation is a critical interview skill. It shows you can think quantitatively about systems and make informed design decisions based on scale.

Why this matters:

  • Validates design decisions (“Do we need caching? Let’s calculate…”)
  • Shows you understand scale
  • Helps identify bottlenecks before building
  • Impresses interviewers (many candidates skip this!)

Always do calculations in interviews! Even rough estimates are better than none.

Power of Two

Memorize this table:

PowerExact ValueApproximateBytesNameExample
101,0241 thousand1 KBKilobyteSmall text file
201,048,5761 million1 MBMegabyteSmall photo
301,073,741,8241 billion1 GBGigabyteHD movie
40~1.1 trillion1 trillion1 TBTerabyteCompany database
50~1.1 quadrillion1 quadrillion1 PBPetabyteGoogle/Facebook scale

Quick tip: For interviews, use 1000 instead of 1024. Close enough and easier math!

Key conversions:

  • 1 byte = 8 bits
  • 1 KB = 1,000 bytes ≈ 10
  • 1 MB = 1,000 KB ≈ 10^6 bytes (million)
  • 1 GB = 1,000 MB ≈ 10^9 bytes (billion)
  • 1 TB = 1,000 GB ≈ 10^12 bytes (trillion)

Latency Numbers Every Programmer Should Know

Memorize these (updated for 2024):

OperationLatencyComparisonNotes
L1 cache reference0.5 ns-On-CPU cache
L2 cache reference7 ns14x L1
Main memory reference100 ns~200x L1RAM access
Compress 1KB with Snappy10 μs10,000 ns
Send 1KB over 1 Gbps network10 μs10,000 ns
Read 1 MB sequentially from memory250 μs250,000 ns
Round trip within datacenter0.5 ms500,000 ns
Read 1 MB sequentially from SSD1 ms4x memory
Disk seek10 ms20x SSDMechanical disk
Read 1 MB sequentially from disk30 ms30x SSD
Send packet CA → Netherlands → CA150 ms300x datacenter RTTTranscontinental

Key takeaways:

  • Memory is fast: 100 ns
  • SSD is okay: 1 ms for 1 MB
  • Disk is slow: 10-30 ms
  • Network within datacenter: ~0.5 ms
  • Cross-continent: ~150 ms

Mnemonic: “Memory, SSD, Disk - Slow, Slower, Slowest”

  • Memory: 100 ns
  • SSD: 1 ms = 10,000x slower
  • Disk: 10 ms = 100,000x slower

Why this matters:

  • Tells you cache > database > disk
  • Network calls are expensive (batch them)
  • Reading 1MB from memory 4x faster than SSD

Availability Numbers

The “Nines” of Availability:

Availability %Downtime/DayDowntime/YearName
90% (one nine)2.4 hours36.5 daysNot acceptable
99% (two nines)14.4 minutes3.65 daysBasic
99.9% (three nines)1.4 minutes8.76 hoursStandard
99.99% (four nines)8.6 seconds52.6 minutesHigh availability
99.999% (five nines)864 ms5.26 minutesMission critical

Quick formula:

Downtime/year = (1 - availability) × 365 days × 24 hours

Example:

99.9% availability:
= (1 - 0.999) × 365 × 24
= 0.001 × 8,760 hours
= 8.76 hours/year

In interviews:

  • Most web services: 99.9% - 99.99%
  • Financial systems: 99.99% - 99.999%
  • Internal tools: 99% is often OK

Cost vs Availability: Each additional “nine” is exponentially more expensive!

Estimation Process (Step-by-Step)

Step 1: Understand the Problem

Ask clarifying questions:

  • How many users?
  • What’s the usage pattern? (requests per user per day)
  • Read-heavy or write-heavy?
  • Any specific requirements?

Don’t assume! Get these numbers from interviewer.

Step 2: Write Down Assumptions

Always write down assumptions clearly:

  • “Assume 100 million DAU”
  • “Assume each user posts 2 tweets per day”
  • “Assume 10:1 read/write ratio”
  • “Assume each tweet is 280 characters = ~300 bytes”

Tip: Round numbers for easier math!

Step 3: Do the Math

Show your work! Interviewers want to see your thought process.

Common calculations:

  1. Traffic estimation (QPS)
  2. Storage estimation
  3. Bandwidth estimation
  4. Memory/cache estimation

Let’s walk through each…

Traffic Estimation (QPS)

Formula:

QPS = (Total operations per day) / (Seconds per day)

Seconds per day:

  • Exact: 86,400 seconds
  • For interviews: Use 100,000 (easier math, close enough!)

Example 1: Twitter Write Traffic

Given:
- 300 million DAU (Daily Active Users)
- Average user posts 2 tweets per day

Step 1: Total tweets per day
= 300M users × 2 tweets
= 600M tweets/day

Step 2: Write QPS (average)
= 600M / 100K seconds
= 6,000 tweets/second

Step 3: Peak QPS (2x average)
= 6,000 × 2
= 12,000 tweets/second at peak

Example 2: Twitter Read Traffic

Given:
- 300M DAU
- Each user views timeline 10 times per day
- Each timeline view loads 20 tweets

Step 1: Timeline views per day
= 300M users × 10 views
= 3 billion views/day

Step 2: Read QPS (average)
= 3B / 100K
= 30,000 QPS

Step 3: Peak QPS
= 30,000 × 2
= 60,000 QPS at peak

Key insight: Twitter is read-heavy (60K reads vs 12K writes = 5:1 ratio)

Storage Estimation

Formula:

Storage = (Number of items) × (Size per item) × (Time period)

Example 1: Twitter Storage

Given:
- 600M tweets/day (from above)
- Each tweet = 280 characters = ~300 bytes
- Also need metadata (userId, timestamp, etc.) = ~200 bytes
- Total per tweet = 500 bytes

Step 1: Daily storage (text only)
= 600M × 500 bytes
= 300 GB/day

Step 2: With media (assume 10% have images, 5 MB avg)
= 600M × 0.1 × 5 MB
= 300M MB
= 300 TB/day

Step 3: Yearly storage
= 300 TB × 365
= 109,500 TB
≈ 110 PB/year

Step 4: With replication (3x for redundancy)
= 110 PB × 3
= 330 PB/year

Example 2: YouTube Storage

Given:
- 500M video uploads/day
- Average video size = 100 MB

Step 1: Daily storage
= 500M × 100 MB
= 50,000,000,000 MB
= 50,000 TB
= 50 PB/day (!!)

Step 2: Yearly storage
= 50 PB × 365
= 18,250 PB/year
≈ 18 EB (exabytes) per year

Step 3: With different resolutions (360p, 720p, 1080p, 4K)
= 50 PB × 4 formats
= 200 PB/day
= 73,000 PB/year
≈ 73 EB/year

Key insight: Video is HUGE! This is why CDN and compression are critical.

Bandwidth Estimation

Formula:

Bandwidth = Data Size / Time

Convert to MB/s or GB/s for easier understanding

Example: Twitter Bandwidth

Upload (Write):
- 300 TB/day (text + media from storage calc)
- Bandwidth = 300 TB / 86,400 seconds
            = 300 × 10^12 bytes / 86,400
            ≈ 3.5 GB/second upload

Download (Read, 5:1 ratio):
- Bandwidth = 3.5 GB/s × 5
            = 17.5 GB/second download

Network capacity check:

  • 1 Gbps = 125 MB/s
  • 10 Gbps = 1.25 GB/s
  • 100 Gbps = 12.5 GB/s

For 17.5 GB/s, need multiple 100 Gbps connections or CDN!

Memory/Cache Estimation

Follow 80/20 rule: Cache the 20% most accessed data (gives 80% hit rate)

Example: Twitter Cache

Given:
- 60K read QPS (from earlier)
- Want to cache hot tweets
- Assume 20% of requests are for same tweets

Step 1: Requests to cache per day
= 60K QPS × 86,400 seconds
= 5.2 billion requests/day

Step 2: Unique tweets (20% are duplicates, so 80% unique)
= 5.2B × 0.8
= 4.2 billion unique tweets/day

Step 3: Cache size (keep recent tweets, 1 day)
= 4.2B × 500 bytes (per tweet)
= 2.1 TB

Step 4: Practical cache (keep hottest 10%)
= 2.1 TB × 0.1
= 210 GB

Conclusion: Need ~200-300 GB Redis cache

Cache size rule of thumb: 10-20% of daily active data

Putting It All Together: Complete Example

Problem: Estimate requirements for Instagram

Given:

  • 500 million DAU
  • Each user uploads 2 photos/day
  • Each photo = 2 MB
  • Each user views 50 photos/day
  • Need to store for 5 years

Step 1: Traffic

Uploads (Write):
= 500M users × 2 photos
= 1 billion photos/day
= 1B / 100K seconds
= 10K writes/second

Peak write QPS = 10K × 2 = 20K QPS

Views (Read):
= 500M users × 50 views
= 25 billion views/day
= 25B / 100K
= 250K reads/second

Peak read QPS = 250K × 2 = 500K QPS

Read:Write ratio = 250K / 10K = 25:1 (very read-heavy!)

Step 2: Storage

Daily storage:
= 1B photos × 2 MB
= 2,000,000,000 MB
= 2 million GB
= 2,000 TB
= 2 PB/day

Yearly storage:
= 2 PB × 365
= 730 PB/year

5-year storage:
= 730 PB × 5
= 3,650 PB
≈ 3.7 EB (exabytes)

With replication (3x):
= 3.7 EB × 3
= 11 EB total

Step 3: Bandwidth

Upload:
= 2 PB/day
= 2 × 10^15 bytes / 86,400 sec
≈ 23 GB/second upload

Download (25:1 ratio):
= 23 GB/s × 25
= 575 GB/second download

Conclusion: Need CDN! Can't serve 575 GB/s from origin servers.

Step 4: Cache

Cache hot photos (recent + popular):
= 250K requests/sec
= Assume 20% of photos get 80% of traffic
= Cache top 5% of daily photos (aggressive caching)

Photos to cache:
= 1B photos/day × 0.05
= 50M photos

Cache size:
= 50M × 2 MB
= 100,000,000 MB
= 100 TB cache

Practical: Distribute across 100 servers with 1 TB RAM each

Summary

MetricValue
Write QPS10K (peak 20K)
Read QPS250K (peak 500K)
Storage/day2 PB
Storage/5-year11 EB (with replication)
Upload bandwidth23 GB/s
Download bandwidth575 GB/s
Cache size100 TB (100 servers)

Design implications:

  • Need CDN (575 GB/s download!)
  • Cache heavily (25:1 read ratio)
  • Shard database (10K write QPS on one DB is tough)
  • Object storage (S3) for photos, not database

Common Estimation Patterns

Pattern 1: DAU → QPS

Template:
1. DAU × actions per user = total actions/day
2. Total actions / 100K seconds = QPS
3. QPS × 2 or 3 = peak QPS

Pattern 2: Social Media

Users post content → Calculate posts/day → Storage
Users view content → Calculate views/day → QPS & Bandwidth
Read:Write ratio typically 10:1 to 100:1

Pattern 3: Video/Streaming

Uploads are large (GB) → Massive storage
Views even larger → CDN mandatory
Different resolutions → Multiply storage by 3-5x

Pattern 4: Messaging

Each message small (few KB) → Storage manageable
But messages LOTS of them → High QPS
Need message queue for reliability

Tips & Tricks

1. Use round numbers:

  • 100M users (not 97.3M)
  • 100K seconds/day (not 86,400)
  • 1 year = 365 days (ignore leap year)

2. Use scientific notation:

  • 1,000,000,000 → 10^9 or “1 billion”
  • Makes math easier

3. Show your work:

Bad: "We need 500 TB"
Good: "100M users × 2 photos/day × 2 MB × 365 days = 146 PB/year"

4. State assumptions clearly:

  • “Assuming 10:1 read/write ratio”
  • “Assuming 80% cache hit rate”
  • “Assuming 3x replication”

5. Sanity check your answer:

  • Does 10 PB/day make sense?
  • Is 1M QPS realistic?
  • Compare to known systems

6. It’s OK to be approximate:

  • “Between 100K and 200K QPS” is fine
  • Ballpark is enough for interviews

Red Flags to Avoid

Skipping estimation - “Let’s just design it”
Being too precise - 86,431.2 seconds
No assumptions - How can you calculate without input?
Forgetting peak traffic - Always mention 2-3x average
Not showing work - “Storage is 1 PB” (how did you get that?)

Green Flags Interviewers Love

Doing calculation unprompted - Shows initiative
Writing down assumptions - Shows clarity of thought
Showing work step-by-step - Easy to follow
Discussing implications - “This means we need CDN”
Sanity checking - “Does this number make sense?”

Practice Problems

Problem 1: Design URL Shortener

  • 100M URLs shortened/day
  • Each URL: 500 bytes
  • Read:Write = 100:1
  • Keep URLs for 5 years

Calculate: Write QPS, Read QPS, Storage, Bandwidth

Problem 2: Design WhatsApp

  • 2B users
  • Each user sends 50 messages/day
  • Each message: 100 bytes
  • Read = 2× Write (send + receive)

Calculate: Write QPS, Read QPS, Daily storage

Problem 3: Design Dropbox

  • 500M users
  • Each user uploads 100 MB/day
  • Each user downloads 500 MB/day (sharing)

Calculate: Upload bandwidth, Download bandwidth, Storage/year

Key Takeaways

  1. Always do back-of-envelope - Even rough estimates > no estimates
  2. Memorize key numbers - Power of 2, latencies, time conversions
  3. Use 100K seconds/day - Makes math easy in interviews
  4. Show your work - Process matters more than exact answer
  5. State assumptions - Get interviewer buy-in
  6. Discuss implications - “This means we need caching”
  7. Round numbers - 100M is better than 97.3M
  8. Peak traffic is 2-3x - Don’t forget to mention it!

This is a critical skill! Practice 2-3 calculations daily for different systems until it becomes automatic.

Before every interview: Review latency numbers and time conversions.


Last Updated: 2026-04-08
Status: Essential - Practice daily