Chapter 1: Scale From Zero to Millions of Users

volume1 scaling architecture fundamentals

Status: 🟩 Essential - Must master
Difficulty: Beginner-friendly
Time to complete: 45 min read


Overview

This chapter walks through the evolution of a system’s architecture as it scales from a single user to millions. It’s a fundamental chapter that shows the thought process behind adding components at each stage.

Why this matters: Shows the incremental scaling approach. Interviewers love to ask β€œHow would you scale this to 10x? 100x?” This chapter gives you the roadmap.

The Scaling Journey

Stage 0: Single Server (1-1,000 users)

Architecture:

[Users] β†’ [Web Server + Database on same machine]

Components:

  • Everything on one server: web app, database, cache
  • Simple to set up and deploy
  • Good for MVP/prototype

Limitations:

  • Single point of failure
  • Can’t scale components independently
  • Limited by single machine resources

When to use: Starting out, MVP, proof of concept


Stage 1: Separate Database (1K-10K users)

Architecture:

[Users] β†’ [Web Server]
              ↓
         [Database Server]

Why separate:

  • Scale web tier and data tier independently
  • Web servers are usually CPU/network bound
  • Databases are usually memory/disk bound
  • Better resource utilization

Database choice:

  • SQL (MySQL, PostgreSQL):

    • Structured data
    • ACID transactions
    • Good for most use cases
  • NoSQL (MongoDB, Cassandra):

    • Flexible schema
    • Horizontal scaling
    • High write throughput

Which to choose: Start with SQL unless you have specific NoSQL needs (super high scale, flexible schema, etc.)


Stage 2: Load Balancer + Multiple Servers (10K-100K users)

Architecture:

[Users] β†’ [Load Balancer]
              ↓
         β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
         ↓         ↓
    [Web Server 1] [Web Server 2]
         ↓         ↓
         β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
              ↓
         [Database]

Load Balancer:

  • Distributes traffic across servers
  • Provides redundancy (failover)
  • Can detect unhealthy servers
  • Commonly used: NGINX, HAProxy, AWS ELB

Benefits:

  • No single point of failure for web tier
  • Can handle more traffic
  • Can do rolling deployments (zero downtime)

Key consideration: Servers must be stateless

  • Don’t store user session in server memory
  • Store sessions in shared cache/database
  • Any server can handle any request

Stage 3: Database Replication (50K-500K users)

Architecture:

         [Load Balancer]
              ↓
         [Web Servers]
              ↓
         [Primary DB] ← Writes
              ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    ↓         ↓         ↓
[Replica 1] [Replica 2] [Replica 3] ← Reads

Primary-Replica Setup:

  • Primary (Master): Handles all writes
  • Replicas (Slaves): Handle reads
  • Replication is usually asynchronous (slight lag OK)

Benefits:

  • Better read performance (distribute reads)
  • Redundancy (if primary fails, promote replica)
  • Can have geographic replicas (lower latency)

Trade-offs:

  • Replication lag (eventual consistency)
  • More complex to manage
  • Need to handle failover logic

When primary fails:

  • Promote a replica to primary
  • Update DNS/connection strings
  • Some data loss possible (last few writes)

Stage 4: Cache Layer (100K-1M users)

Architecture:

[Web Servers] β†’ [Cache] β†’ [Database]
              (Redis/Memcached)

Why cache:

  • Database queries are expensive
  • Many queries are repeated (hot data)
  • Cache is in-memory (100x faster than disk)

What to cache:

  • User profiles (frequently accessed)
  • Popular posts/content
  • Session data
  • Computation results

Cache strategies (see key-patterns > 1. Caching Strategies):

  1. Cache-Aside (most common):

    • Check cache first
    • If miss β†’ query DB β†’ store in cache
    • If hit β†’ return from cache
  2. Write-Through:

    • Write to cache
    • Cache writes to DB synchronously
  3. Write-Behind:

    • Write to cache
    • Cache writes to DB asynchronously

Cache invalidation (hard problem!):

  • TTL (Time-To-Live): Expire after X seconds
  • Write invalidation: Delete cache entry on update
  • Hybrid: TTL + invalidation on write

Common caches:

  • Redis: In-memory, persistent, rich data structures
  • Memcached: Simple, fast, volatile

When to add: Database is bottleneck, read-heavy workload (10:1 ratio or more)


Stage 5: CDN (Content Delivery Network) (500K+ users)

Architecture:

[Users] β†’ [CDN] β†’ [Load Balancer] β†’ [Web Servers]
         (Static)     (Dynamic)

What is CDN:

  • Geographically distributed servers
  • Cache static content close to users
  • Reduces latency and server load

What to serve from CDN:

  • Images, videos
  • JavaScript, CSS files
  • Static HTML pages
  • Downloadable files

How it works:

  • User requests image.jpg
  • CDN checks if it has cached copy
  • If yes β†’ serve from CDN (fast!)
  • If no β†’ fetch from origin server β†’ cache β†’ serve

Push vs Pull:

  • Push: Upload content to CDN manually (good for rarely changing content)
  • Pull: CDN fetches on first request (good for frequently changing content)

Benefits:

  • Lower latency (closer to users)
  • Reduced server load
  • Better availability (CDN = highly available)

Considerations:

  • Cost (CDN bandwidth not free)
  • Cache invalidation (how to update content?)
  • Fallback if CDN fails

Popular CDNs: Cloudflare, AWS CloudFront, Akamai, Fastly


Stage 6: Stateless Web Tier (1M+ users)

Problem: Session data in web servers prevents scaling

Architecture:

[Load Balancer]
    ↓
[Web Servers] β†’ [Session Store]
  (Stateless)    (Redis/DB)

Why stateless:

  • Any server can handle any request
  • Easy to add/remove servers (auto-scaling)
  • No sticky sessions needed

Where to store session data:

  • Redis: Fast, in-memory (recommended)
  • Database: Persistent but slower
  • JWT tokens: Client-side (no server state)

Benefits:

  • Easy horizontal scaling
  • Better load distribution
  • Simplified deployment

Stage 7: Data Centers (Multi-region) (5M+ users)

Architecture:

Region 1 (US)              Region 2 (EU)
[LB] β†’ [Servers] β†’ [DB]    [LB] β†’ [Servers] β†’ [DB]
         ↓                          ↓
    [Cache + CDN]              [Cache + CDN]
         ↕ ─────────────────────── ↕
           (Cross-region sync)

Why multi-region:

  • Lower latency for global users
  • High availability (disaster recovery)
  • Compliance (data residency laws)

Challenges:

  • Traffic routing: GeoDNS routes users to nearest region
  • Data synchronization: Replicate across regions (eventual consistency)
  • Test and deployment: Deploy to all regions

Considerations:

  • Cost (multiple regions expensive)
  • Complexity (data sync, deployment)
  • Trade-off: Availability vs Consistency

Stage 8: Message Queue (Decouple Components) (10M+ users)

Architecture:

[Web Servers] β†’ [Message Queue] β†’ [Workers]
                  (Kafka/RabbitMQ)

Why message queue:

  • Decouple components
  • Handle async tasks (email, notifications)
  • Better fault tolerance
  • Handle traffic spikes (queue buffers)

Use cases:

  • Send email after signup
  • Process video encoding
  • Generate reports
  • Send push notifications

Benefits:

  • Web servers respond faster (offload work)
  • Workers can scale independently
  • Retry failed tasks
  • Better reliability

Popular options: Apache Kafka, RabbitMQ, AWS SQS


Stage 9: Logging, Metrics, Automation (10M+ users)

Not architectural change, but critical at scale:

1. Logging:

  • Centralized logging (ELK stack, Splunk)
  • Track errors and debug issues
  • Aggregate logs from all servers

2. Metrics:

  • Monitor system health (CPU, memory, disk)
  • Application metrics (QPS, latency, error rate)
  • Business metrics (signups, revenue)
  • Tools: Prometheus, Grafana, Datadog

3. Automation:

  • Auto-scaling based on load
  • Automated deployment (CI/CD)
  • Automated testing
  • Infrastructure as Code (Terraform)

Why it matters:

  • Can’t manually check 1000 servers
  • Detect issues before users notice
  • Faster deployment and recovery

Stage 10: Database Scaling (50M+ users)

Problem: Single database can’t handle load

Solution 1: Vertical Scaling (Scale Up)

  • Add more CPU, RAM, disk to database
  • βœ… Simple, no code changes
  • ❌ Hardware limits, expensive, single point of failure
  • Limit: Can scale to ~100K QPS

Solution 2: Horizontal Scaling (Scale Out)

A. Sharding (Partitioning):

User IDs 1-1M    β†’ [Shard 1]
User IDs 1M-2M   β†’ [Shard 2]
User IDs 2M-3M   β†’ [Shard 3]
User IDs 3M-4M   β†’ [Shard 4]

Sharding key: How to distribute data (userId, geographyId, etc.)

Benefits:

  • Unlimited scaling (add more shards)
  • Each shard handles subset of data

Challenges:

  • Resharding (when shard outgrows capacity)
  • Celebrity problem (one shard hot)
  • Cross-shard queries (joins are hard)

Sharding strategies:

  • Hash-based: hash(userId) % num_shards
  • Range-based: User IDs 1-1M, 1M-2M, etc.
  • Geographic: US users β†’ US shard, EU β†’ EU shard
  • Consistent hashing: Better for adding/removing shards (see Ch 5)

B. Read Replicas (covered earlier):

  • Already implemented in Stage 3
  • Good for read-heavy workloads
  • Doesn’t help with write load

Summary: Architecture Evolution

Stage 0: Single server
    ↓
Stage 1: Separate database
    ↓
Stage 2: Load balancer + multiple web servers
    ↓
Stage 3: Database replication (primary-replica)
    ↓
Stage 4: Cache layer (Redis)
    ↓
Stage 5: CDN for static content
    ↓
Stage 6: Stateless web tier (session in Redis)
    ↓
Stage 7: Multiple data centers
    ↓
Stage 8: Message queue for async tasks
    ↓
Stage 9: Logging, metrics, automation
    ↓
Stage 10: Database sharding

Key principle: Scale incrementally. Add complexity only when needed!

Key Takeaways

  1. Start simple: Single server is fine for MVP
  2. Separate concerns: Web tier, data tier, cache tier
  3. Horizontal scaling: Add more machines, not bigger machines
  4. Stateless servers: Store state in cache/database, not in-memory
  5. Replication: For redundancy and read performance
  6. Caching: Dramatically improves performance
  7. CDN: Serve static content from edge
  8. Message queues: Decouple and handle async work
  9. Sharding: Last resort for database scaling
  10. Monitor everything: Can’t manage what you don’t measure

Common Interview Questions

Q: How do you scale from 1K to 1M users?
A: Follow the stages: Separate database β†’ Load balancer + servers β†’ Database replication β†’ Cache β†’ CDN β†’ Stateless tier β†’ Message queue β†’ Sharding if needed.

Q: When would you add a cache?
A: When database is bottleneck, read-heavy workload (10:1 ratio or higher), hot data accessed frequently.

Q: SQL vs NoSQL?
A: SQL for structured data, ACID, relationships. NoSQL for flexible schema, horizontal scaling, super high write throughput. Start with SQL unless specific NoSQL needs.

Q: How do you handle stateful servers?
A: Move state to shared store (Redis for sessions, DB for user data). Make servers stateless so any server can handle any request.

Q: When do you shard the database?
A: Last resort! When vertical scaling exhausted, replication not enough, and single database can’t handle load (usually > 100K QPS).

Patterns Used

External Resources


This is a foundational chapter! The scaling journey appears in many interview questions. Master this progression.

Practice: Pick any system (Twitter, Instagram, Uber) and describe how it would scale through these stages.


Last Updated: 2026-04-08
Status: Essential - Review before every interview