Chapter 1: Scale From Zero to Millions of Users
volume1 scaling architecture fundamentals
Status: π© Essential - Must master
Difficulty: Beginner-friendly
Time to complete: 45 min read
Overview
This chapter walks through the evolution of a systemβs architecture as it scales from a single user to millions. Itβs a fundamental chapter that shows the thought process behind adding components at each stage.
Why this matters: Shows the incremental scaling approach. Interviewers love to ask βHow would you scale this to 10x? 100x?β This chapter gives you the roadmap.
The Scaling Journey
Stage 0: Single Server (1-1,000 users)
Architecture:
[Users] β [Web Server + Database on same machine]
Components:
- Everything on one server: web app, database, cache
- Simple to set up and deploy
- Good for MVP/prototype
Limitations:
- Single point of failure
- Canβt scale components independently
- Limited by single machine resources
When to use: Starting out, MVP, proof of concept
Stage 1: Separate Database (1K-10K users)
Architecture:
[Users] β [Web Server]
β
[Database Server]
Why separate:
- Scale web tier and data tier independently
- Web servers are usually CPU/network bound
- Databases are usually memory/disk bound
- Better resource utilization
Database choice:
-
SQL (MySQL, PostgreSQL):
- Structured data
- ACID transactions
- Good for most use cases
-
NoSQL (MongoDB, Cassandra):
- Flexible schema
- Horizontal scaling
- High write throughput
Which to choose: Start with SQL unless you have specific NoSQL needs (super high scale, flexible schema, etc.)
Stage 2: Load Balancer + Multiple Servers (10K-100K users)
Architecture:
[Users] β [Load Balancer]
β
ββββββ΄βββββ
β β
[Web Server 1] [Web Server 2]
β β
ββββββ¬βββββ
β
[Database]
Load Balancer:
- Distributes traffic across servers
- Provides redundancy (failover)
- Can detect unhealthy servers
- Commonly used: NGINX, HAProxy, AWS ELB
Benefits:
- No single point of failure for web tier
- Can handle more traffic
- Can do rolling deployments (zero downtime)
Key consideration: Servers must be stateless
- Donβt store user session in server memory
- Store sessions in shared cache/database
- Any server can handle any request
Stage 3: Database Replication (50K-500K users)
Architecture:
[Load Balancer]
β
[Web Servers]
β
[Primary DB] β Writes
β
βββββββββββΌββββββββββ
β β β
[Replica 1] [Replica 2] [Replica 3] β Reads
Primary-Replica Setup:
- Primary (Master): Handles all writes
- Replicas (Slaves): Handle reads
- Replication is usually asynchronous (slight lag OK)
Benefits:
- Better read performance (distribute reads)
- Redundancy (if primary fails, promote replica)
- Can have geographic replicas (lower latency)
Trade-offs:
- Replication lag (eventual consistency)
- More complex to manage
- Need to handle failover logic
When primary fails:
- Promote a replica to primary
- Update DNS/connection strings
- Some data loss possible (last few writes)
Stage 4: Cache Layer (100K-1M users)
Architecture:
[Web Servers] β [Cache] β [Database]
(Redis/Memcached)
Why cache:
- Database queries are expensive
- Many queries are repeated (hot data)
- Cache is in-memory (100x faster than disk)
What to cache:
- User profiles (frequently accessed)
- Popular posts/content
- Session data
- Computation results
Cache strategies (see key-patterns > 1. Caching Strategies):
-
Cache-Aside (most common):
- Check cache first
- If miss β query DB β store in cache
- If hit β return from cache
-
Write-Through:
- Write to cache
- Cache writes to DB synchronously
-
Write-Behind:
- Write to cache
- Cache writes to DB asynchronously
Cache invalidation (hard problem!):
- TTL (Time-To-Live): Expire after X seconds
- Write invalidation: Delete cache entry on update
- Hybrid: TTL + invalidation on write
Common caches:
- Redis: In-memory, persistent, rich data structures
- Memcached: Simple, fast, volatile
When to add: Database is bottleneck, read-heavy workload (10:1 ratio or more)
Stage 5: CDN (Content Delivery Network) (500K+ users)
Architecture:
[Users] β [CDN] β [Load Balancer] β [Web Servers]
(Static) (Dynamic)
What is CDN:
- Geographically distributed servers
- Cache static content close to users
- Reduces latency and server load
What to serve from CDN:
- Images, videos
- JavaScript, CSS files
- Static HTML pages
- Downloadable files
How it works:
- User requests image.jpg
- CDN checks if it has cached copy
- If yes β serve from CDN (fast!)
- If no β fetch from origin server β cache β serve
Push vs Pull:
- Push: Upload content to CDN manually (good for rarely changing content)
- Pull: CDN fetches on first request (good for frequently changing content)
Benefits:
- Lower latency (closer to users)
- Reduced server load
- Better availability (CDN = highly available)
Considerations:
- Cost (CDN bandwidth not free)
- Cache invalidation (how to update content?)
- Fallback if CDN fails
Popular CDNs: Cloudflare, AWS CloudFront, Akamai, Fastly
Stage 6: Stateless Web Tier (1M+ users)
Problem: Session data in web servers prevents scaling
Architecture:
[Load Balancer]
β
[Web Servers] β [Session Store]
(Stateless) (Redis/DB)
Why stateless:
- Any server can handle any request
- Easy to add/remove servers (auto-scaling)
- No sticky sessions needed
Where to store session data:
- Redis: Fast, in-memory (recommended)
- Database: Persistent but slower
- JWT tokens: Client-side (no server state)
Benefits:
- Easy horizontal scaling
- Better load distribution
- Simplified deployment
Stage 7: Data Centers (Multi-region) (5M+ users)
Architecture:
Region 1 (US) Region 2 (EU)
[LB] β [Servers] β [DB] [LB] β [Servers] β [DB]
β β
[Cache + CDN] [Cache + CDN]
β βββββββββββββββββββββββ β
(Cross-region sync)
Why multi-region:
- Lower latency for global users
- High availability (disaster recovery)
- Compliance (data residency laws)
Challenges:
- Traffic routing: GeoDNS routes users to nearest region
- Data synchronization: Replicate across regions (eventual consistency)
- Test and deployment: Deploy to all regions
Considerations:
- Cost (multiple regions expensive)
- Complexity (data sync, deployment)
- Trade-off: Availability vs Consistency
Stage 8: Message Queue (Decouple Components) (10M+ users)
Architecture:
[Web Servers] β [Message Queue] β [Workers]
(Kafka/RabbitMQ)
Why message queue:
- Decouple components
- Handle async tasks (email, notifications)
- Better fault tolerance
- Handle traffic spikes (queue buffers)
Use cases:
- Send email after signup
- Process video encoding
- Generate reports
- Send push notifications
Benefits:
- Web servers respond faster (offload work)
- Workers can scale independently
- Retry failed tasks
- Better reliability
Popular options: Apache Kafka, RabbitMQ, AWS SQS
Stage 9: Logging, Metrics, Automation (10M+ users)
Not architectural change, but critical at scale:
1. Logging:
- Centralized logging (ELK stack, Splunk)
- Track errors and debug issues
- Aggregate logs from all servers
2. Metrics:
- Monitor system health (CPU, memory, disk)
- Application metrics (QPS, latency, error rate)
- Business metrics (signups, revenue)
- Tools: Prometheus, Grafana, Datadog
3. Automation:
- Auto-scaling based on load
- Automated deployment (CI/CD)
- Automated testing
- Infrastructure as Code (Terraform)
Why it matters:
- Canβt manually check 1000 servers
- Detect issues before users notice
- Faster deployment and recovery
Stage 10: Database Scaling (50M+ users)
Problem: Single database canβt handle load
Solution 1: Vertical Scaling (Scale Up)
- Add more CPU, RAM, disk to database
- β Simple, no code changes
- β Hardware limits, expensive, single point of failure
- Limit: Can scale to ~100K QPS
Solution 2: Horizontal Scaling (Scale Out)
A. Sharding (Partitioning):
User IDs 1-1M β [Shard 1]
User IDs 1M-2M β [Shard 2]
User IDs 2M-3M β [Shard 3]
User IDs 3M-4M β [Shard 4]
Sharding key: How to distribute data (userId, geographyId, etc.)
Benefits:
- Unlimited scaling (add more shards)
- Each shard handles subset of data
Challenges:
- Resharding (when shard outgrows capacity)
- Celebrity problem (one shard hot)
- Cross-shard queries (joins are hard)
Sharding strategies:
- Hash-based:
hash(userId) % num_shards - Range-based: User IDs 1-1M, 1M-2M, etc.
- Geographic: US users β US shard, EU β EU shard
- Consistent hashing: Better for adding/removing shards (see Ch 5)
B. Read Replicas (covered earlier):
- Already implemented in Stage 3
- Good for read-heavy workloads
- Doesnβt help with write load
Summary: Architecture Evolution
Stage 0: Single server
β
Stage 1: Separate database
β
Stage 2: Load balancer + multiple web servers
β
Stage 3: Database replication (primary-replica)
β
Stage 4: Cache layer (Redis)
β
Stage 5: CDN for static content
β
Stage 6: Stateless web tier (session in Redis)
β
Stage 7: Multiple data centers
β
Stage 8: Message queue for async tasks
β
Stage 9: Logging, metrics, automation
β
Stage 10: Database sharding
Key principle: Scale incrementally. Add complexity only when needed!
Key Takeaways
- Start simple: Single server is fine for MVP
- Separate concerns: Web tier, data tier, cache tier
- Horizontal scaling: Add more machines, not bigger machines
- Stateless servers: Store state in cache/database, not in-memory
- Replication: For redundancy and read performance
- Caching: Dramatically improves performance
- CDN: Serve static content from edge
- Message queues: Decouple and handle async work
- Sharding: Last resort for database scaling
- Monitor everything: Canβt manage what you donβt measure
Common Interview Questions
Q: How do you scale from 1K to 1M users?
A: Follow the stages: Separate database β Load balancer + servers β Database replication β Cache β CDN β Stateless tier β Message queue β Sharding if needed.
Q: When would you add a cache?
A: When database is bottleneck, read-heavy workload (10:1 ratio or higher), hot data accessed frequently.
Q: SQL vs NoSQL?
A: SQL for structured data, ACID, relationships. NoSQL for flexible schema, horizontal scaling, super high write throughput. Start with SQL unless specific NoSQL needs.
Q: How do you handle stateful servers?
A: Move state to shared store (Redis for sessions, DB for user data). Make servers stateless so any server can handle any request.
Q: When do you shard the database?
A: Last resort! When vertical scaling exhausted, replication not enough, and single database canβt handle load (usually > 100K QPS).
Patterns Used
- key-patterns > 1. Client-Server Architecture
- key-patterns > 1. Database Replication
- key-patterns > 2. Database Sharding (Partitioning)
- key-patterns > 1. Caching Strategies
- key-patterns > 3. CDN (Content Delivery Network)
- key-patterns > 4. Load Balancing Algorithms
- Sub
Related Chapters
- Chapter 2 ch02-back-of-envelope-estimation: Calculate QPS and storage at each stage
- Chapter 3 ch03-framework-for-system-design: Apply framework to scaling decisions
- Chapter 5 ch05-consistent-hashing: Better sharding strategy
- Chapter 6 ch06-key-value-store: Deep dive into distributed storage
External Resources
- estimation-cheatsheet: Numbers for calculating when to scale
- key-patterns: Detailed pattern explanations
This is a foundational chapter! The scaling journey appears in many interview questions. Master this progression.
Practice: Pick any system (Twitter, Instagram, Uber) and describe how it would scale through these stages.
Last Updated: 2026-04-08
Status: Essential - Review before every interview