Chapter 1: Scale From Zero to Millions of Users

volume1 scaling architecture fundamentals

Status: 🟩 Essential - Must master
Difficulty: Beginner-friendly
Time to complete: 45 min read

Overview

This chapter walks through the evolution of a system’s architecture as it scales from a single user to millions. It’s a fundamental chapter that shows the thought process behind adding components at each stage.

Why this matters: Shows the incremental scaling approach. Interviewers love to ask “How would you scale this to 10x? 100x?” This chapter gives you the roadmap.

The Scaling Journey

Stage 0: Single Server (1-1,000 users)

Architecture:

[Users] → [Web Server + Database on same machine]

Components:

Everything on one server: web app, database, cache
Simple to set up and deploy
Good for MVP/prototype

Limitations:

Single point of failure
Can’t scale components independently
Limited by single machine resources

When to use: Starting out, MVP, proof of concept

Stage 1: Separate Database (1K-10K users)

Architecture:

[Users] → [Web Server]
              ↓
         [Database Server]

Why separate:

Scale web tier and data tier independently
Web servers are usually CPU/network bound
Databases are usually memory/disk bound
Better resource utilization

Database choice:

SQL (MySQL, PostgreSQL):
- Structured data
- ACID transactions
- Good for most use cases
NoSQL (MongoDB, Cassandra):
- Flexible schema
- Horizontal scaling
- High write throughput

Which to choose: Start with SQL unless you have specific NoSQL needs (super high scale, flexible schema, etc.)

Stage 2: Load Balancer + Multiple Servers (10K-100K users)

Architecture:

[Users] → [Load Balancer]
              ↓
         ┌────┴────┐
         ↓         ↓
    [Web Server 1] [Web Server 2]
         ↓         ↓
         └────┬────┘
              ↓
         [Database]

Load Balancer:

Distributes traffic across servers
Provides redundancy (failover)
Can detect unhealthy servers
Commonly used: NGINX, HAProxy, AWS ELB

Benefits:

No single point of failure for web tier
Can handle more traffic
Can do rolling deployments (zero downtime)

Key consideration: Servers must be stateless

Don’t store user session in server memory
Store sessions in shared cache/database
Any server can handle any request

Stage 3: Database Replication (50K-500K users)

Architecture:

         [Load Balancer]
              ↓
         [Web Servers]
              ↓
         [Primary DB] ← Writes
              ↓
    ┌─────────┼─────────┐
    ↓         ↓         ↓
[Replica 1] [Replica 2] [Replica 3] ← Reads

Primary-Replica Setup:

Primary (Master): Handles all writes
Replicas (Slaves): Handle reads
Replication is usually asynchronous (slight lag OK)

Benefits:

Better read performance (distribute reads)
Redundancy (if primary fails, promote replica)
Can have geographic replicas (lower latency)

Trade-offs:

Replication lag (eventual consistency)
More complex to manage
Need to handle failover logic

When primary fails:

Promote a replica to primary
Update DNS/connection strings
Some data loss possible (last few writes)

Stage 4: Cache Layer (100K-1M users)

Architecture:

[Web Servers] → [Cache] → [Database]
              (Redis/Memcached)

Why cache:

Database queries are expensive
Many queries are repeated (hot data)
Cache is in-memory (100x faster than disk)

What to cache:

User profiles (frequently accessed)
Popular posts/content
Session data
Computation results

Cache strategies (see key-patterns > 1. Caching Strategies):

Cache-Aside (most common):
- Check cache first
- If miss → query DB → store in cache
- If hit → return from cache
Write-Through:
- Write to cache
- Cache writes to DB synchronously
Write-Behind:
- Write to cache
- Cache writes to DB asynchronously

Cache invalidation (hard problem!):

TTL (Time-To-Live): Expire after X seconds
Write invalidation: Delete cache entry on update
Hybrid: TTL + invalidation on write

Common caches:

Redis: In-memory, persistent, rich data structures
Memcached: Simple, fast, volatile

When to add: Database is bottleneck, read-heavy workload (10:1 ratio or more)

Stage 5: CDN (Content Delivery Network) (500K+ users)

Architecture:

[Users] → [CDN] → [Load Balancer] → [Web Servers]
         (Static)     (Dynamic)

What is CDN:

Geographically distributed servers
Cache static content close to users
Reduces latency and server load

What to serve from CDN:

Images, videos
JavaScript, CSS files
Static HTML pages
Downloadable files

How it works:

User requests image.jpg
CDN checks if it has cached copy
If yes → serve from CDN (fast!)
If no → fetch from origin server → cache → serve

Push vs Pull:

Push: Upload content to CDN manually (good for rarely changing content)
Pull: CDN fetches on first request (good for frequently changing content)

Benefits:

Lower latency (closer to users)
Reduced server load
Better availability (CDN = highly available)

Considerations:

Cost (CDN bandwidth not free)
Cache invalidation (how to update content?)
Fallback if CDN fails

Popular CDNs: Cloudflare, AWS CloudFront, Akamai, Fastly

Stage 6: Stateless Web Tier (1M+ users)

Problem: Session data in web servers prevents scaling

Architecture:

[Load Balancer]
    ↓
[Web Servers] → [Session Store]
  (Stateless)    (Redis/DB)

Why stateless:

Any server can handle any request
Easy to add/remove servers (auto-scaling)
No sticky sessions needed

Where to store session data:

Redis: Fast, in-memory (recommended)
Database: Persistent but slower
JWT tokens: Client-side (no server state)

Benefits:

Easy horizontal scaling
Better load distribution
Simplified deployment

Stage 7: Data Centers (Multi-region) (5M+ users)

Architecture:

Region 1 (US)              Region 2 (EU)
[LB] → [Servers] → [DB]    [LB] → [Servers] → [DB]
         ↓                          ↓
    [Cache + CDN]              [Cache + CDN]
         ↕ ─────────────────────── ↕
           (Cross-region sync)

Why multi-region:

Lower latency for global users
High availability (disaster recovery)
Compliance (data residency laws)

Challenges:

Traffic routing: GeoDNS routes users to nearest region
Data synchronization: Replicate across regions (eventual consistency)
Test and deployment: Deploy to all regions

Considerations:

Cost (multiple regions expensive)
Complexity (data sync, deployment)
Trade-off: Availability vs Consistency

Stage 8: Message Queue (Decouple Components) (10M+ users)

Architecture:

[Web Servers] → [Message Queue] → [Workers]
                  (Kafka/RabbitMQ)

Why message queue:

Decouple components
Handle async tasks (email, notifications)
Better fault tolerance
Handle traffic spikes (queue buffers)

Use cases:

Send email after signup
Process video encoding
Generate reports
Send push notifications

Benefits:

Web servers respond faster (offload work)
Workers can scale independently
Retry failed tasks
Better reliability

Popular options: Apache Kafka, RabbitMQ, AWS SQS

Stage 9: Logging, Metrics, Automation (10M+ users)

Not architectural change, but critical at scale:

1. Logging:

Centralized logging (ELK stack, Splunk)
Track errors and debug issues
Aggregate logs from all servers

2. Metrics:

Monitor system health (CPU, memory, disk)
Application metrics (QPS, latency, error rate)
Business metrics (signups, revenue)
Tools: Prometheus, Grafana, Datadog

3. Automation:

Auto-scaling based on load
Automated deployment (CI/CD)
Automated testing
Infrastructure as Code (Terraform)

Why it matters:

Can’t manually check 1000 servers
Detect issues before users notice
Faster deployment and recovery

Stage 10: Database Scaling (50M+ users)

Problem: Single database can’t handle load

Solution 1: Vertical Scaling (Scale Up)

Add more CPU, RAM, disk to database
✅ Simple, no code changes
❌ Hardware limits, expensive, single point of failure
Limit: Can scale to ~100K QPS

Solution 2: Horizontal Scaling (Scale Out)

A. Sharding (Partitioning):

User IDs 1-1M    → [Shard 1]
User IDs 1M-2M   → [Shard 2]
User IDs 2M-3M   → [Shard 3]
User IDs 3M-4M   → [Shard 4]

Sharding key: How to distribute data (userId, geographyId, etc.)

Benefits:

Unlimited scaling (add more shards)
Each shard handles subset of data

Challenges:

Resharding (when shard outgrows capacity)
Celebrity problem (one shard hot)
Cross-shard queries (joins are hard)

Sharding strategies:

Hash-based: hash(userId) % num_shards
Range-based: User IDs 1-1M, 1M-2M, etc.
Geographic: US users → US shard, EU → EU shard
Consistent hashing: Better for adding/removing shards (see Ch 5)

B. Read Replicas (covered earlier):

Already implemented in Stage 3
Good for read-heavy workloads
Doesn’t help with write load

Summary: Architecture Evolution

Stage 0: Single server
    ↓
Stage 1: Separate database
    ↓
Stage 2: Load balancer + multiple web servers
    ↓
Stage 3: Database replication (primary-replica)
    ↓
Stage 4: Cache layer (Redis)
    ↓
Stage 5: CDN for static content
    ↓
Stage 6: Stateless web tier (session in Redis)
    ↓
Stage 7: Multiple data centers
    ↓
Stage 8: Message queue for async tasks
    ↓
Stage 9: Logging, metrics, automation
    ↓
Stage 10: Database sharding

Key principle: Scale incrementally. Add complexity only when needed!

Key Takeaways

Start simple: Single server is fine for MVP
Separate concerns: Web tier, data tier, cache tier
Horizontal scaling: Add more machines, not bigger machines
Stateless servers: Store state in cache/database, not in-memory
Replication: For redundancy and read performance
Caching: Dramatically improves performance
CDN: Serve static content from edge
Message queues: Decouple and handle async work
Sharding: Last resort for database scaling
Monitor everything: Can’t manage what you don’t measure

Common Interview Questions

Q: How do you scale from 1K to 1M users?
A: Follow the stages: Separate database → Load balancer + servers → Database replication → Cache → CDN → Stateless tier → Message queue → Sharding if needed.

Q: When would you add a cache?
A: When database is bottleneck, read-heavy workload (10:1 ratio or higher), hot data accessed frequently.

Q: SQL vs NoSQL?
A: SQL for structured data, ACID, relationships. NoSQL for flexible schema, horizontal scaling, super high write throughput. Start with SQL unless specific NoSQL needs.

Q: How do you handle stateful servers?
A: Move state to shared store (Redis for sessions, DB for user data). Make servers stateless so any server can handle any request.

Q: When do you shard the database?
A: Last resort! When vertical scaling exhausted, replication not enough, and single database can’t handle load (usually > 100K QPS).

Patterns Used

Chapter 2 ch02-back-of-envelope-estimation: Calculate QPS and storage at each stage
Chapter 3 ch03-framework-for-system-design: Apply framework to scaling decisions
Chapter 5 ch05-consistent-hashing: Better sharding strategy
Chapter 6 ch06-key-value-store: Deep dive into distributed storage

External Resources

estimation-cheatsheet: Numbers for calculating when to scale
key-patterns: Detailed pattern explanations

This is a foundational chapter! The scaling journey appears in many interview questions. Master this progression.

Practice: Pick any system (Twitter, Instagram, Uber) and describe how it would scale through these stages.

Last Updated: 2026-04-08
Status: Essential - Review before every interview

Study Notes by Niladri & AI

Explorer

ch01-scale-from-zero-to-millions

Chapter 1: Scale From Zero to Millions of Users

Overview

The Scaling Journey

Stage 0: Single Server (1-1,000 users)

Stage 1: Separate Database (1K-10K users)

Stage 2: Load Balancer + Multiple Servers (10K-100K users)

Stage 3: Database Replication (50K-500K users)

Stage 4: Cache Layer (100K-1M users)

Stage 5: CDN (Content Delivery Network) (500K+ users)

Stage 6: Stateless Web Tier (1M+ users)

Stage 7: Data Centers (Multi-region) (5M+ users)

Stage 8: Message Queue (Decouple Components) (10M+ users)

Stage 9: Logging, Metrics, Automation (10M+ users)

Stage 10: Database Scaling (50M+ users)

Summary: Architecture Evolution

Key Takeaways

Common Interview Questions

Patterns Used

External Resources

Graph View

Table of Contents

Backlinks

Study Notes by Niladri & AI

Explorer

ch01-scale-from-zero-to-millions

Chapter 1: Scale From Zero to Millions of Users

Overview

The Scaling Journey

Stage 0: Single Server (1-1,000 users)

Stage 1: Separate Database (1K-10K users)

Stage 2: Load Balancer + Multiple Servers (10K-100K users)

Stage 3: Database Replication (50K-500K users)

Stage 4: Cache Layer (100K-1M users)

Stage 5: CDN (Content Delivery Network) (500K+ users)

Stage 6: Stateless Web Tier (1M+ users)

Stage 7: Data Centers (Multi-region) (5M+ users)

Stage 8: Message Queue (Decouple Components) (10M+ users)

Stage 9: Logging, Metrics, Automation (10M+ users)

Stage 10: Database Scaling (50M+ users)

Summary: Architecture Evolution

Key Takeaways

Common Interview Questions

Patterns Used

Related Chapters

External Resources

Graph View

Table of Contents

Backlinks