Chapter 1 Flashcards - Scale From Zero to Millions

flashcards volume1 scaling fundamentals

What are the 10 stages of scaling from zero to millions of users?
?
Stage 0: Single server. Stage 1: Separate database. Stage 2: Load balancer + multiple servers. Stage 3: Database replication. Stage 4: Cache layer. Stage 5: CDN. Stage 6: Stateless web tier. Stage 7: Multiple data centers. Stage 8: Message queue. Stage 9: Logging/metrics/automation. Stage 10: Database sharding.

Why separate the database from the web server (Stage 1)?
?
Scale web tier and data tier independently. Web servers are CPU/network bound, databases are memory/disk bound. Better resource utilization. Allows specialization of hardware for each tier.

What is the purpose of a load balancer (Stage 2)?
?
Distributes traffic across multiple servers, provides redundancy (failover if server fails), can detect unhealthy servers (health checks), enables zero-downtime deployments (rolling updates). Common: NGINX, HAProxy, AWS ELB.

Why must web servers be stateless when using a load balancer?
?
Any server must be able to handle any request. If session data is in server memory, user must always go to same server (sticky sessions). Solution: Store session data in shared cache (Redis) or database, not in server memory.

What is database replication and why use it (Stage 3)?
?
Primary-Replica setup: Primary handles writes, replicas handle reads. Benefits: Better read performance (distribute reads across replicas), redundancy (if primary fails, promote replica), can have geographic replicas (lower latency). Trade-off: Replication lag (eventual consistency).

What happens when the primary database fails in a replication setup?
?
Promote a replica to primary, update DNS/connection strings to point to new primary, handle replication lag (last few writes might be lost), setup new replicas from promoted primary. Some data loss possible if using async replication.

When should you add a cache layer (Stage 4)?
?
When database is bottleneck, read-heavy workload (10:1 ratio or more), many repeated queries (hot data), users experiencing slow response times. Cache is in-memory so 100x faster than disk. Common: Redis, Memcached.

What are the three cache strategies?
?
Cache-Aside (most common): Check cache first, if miss then query DB then store in cache. Write-Through: Write to cache, cache writes to DB synchronously (consistent but slower writes). Write-Behind: Write to cache, cache writes to DB asynchronously (fast writes but risk of data loss).

What is cache invalidation and why is it hard?
?
Removing or updating stale data in cache. Hard because: Multiple cache servers need coordination, deciding when to invalidate (on write? timeout?), cache stampede (all clients miss at once and hit DB). Strategies: TTL (expire after X seconds), write invalidation (delete on update), hybrid.

What is a CDN and when should you use it (Stage 5)?
?
Content Delivery Network - geographically distributed servers that cache static content close to users. Use when: Have static content (images, videos, JS, CSS), geographically distributed users, want to reduce origin server load, need lower latency. Benefits: Lower latency (closer to users), reduced server load, better availability.

What is the difference between Push CDN and Pull CDN?
?
Push CDN: Upload content to CDN manually. Pros: Full control, good for rarely changing content. Cons: Manual management, storage costs. Pull CDN: CDN fetches from origin on cache miss. Pros: Automatic, good for frequently changing content. Cons: First request is slow (cache miss).

Why make the web tier stateless (Stage 6)?
?
Enables easy horizontal scaling (add/remove servers), no sticky sessions needed (any server can handle any request), better load distribution, simplified deployment. Move session data to Redis or database, not in-memory on web servers.

What are the benefits of multiple data centers (Stage 7)?
?
Lower latency for global users (closer datacenter), high availability (disaster recovery), compliance with data residency laws (GDPR). Challenges: Traffic routing (GeoDNS), data synchronization (eventual consistency), deploy to all regions, higher cost.

What is a message queue and when to use it (Stage 8)?
?
Decouples components - producers send messages, consumers process them asynchronously. Use for: Send email after signup, process video encoding, generate reports, send push notifications. Benefits: Web servers respond faster (offload work), workers scale independently, retry failed tasks, handle traffic spikes (queue buffers). Popular: Kafka, RabbitMQ, AWS SQS.

Why are logging, metrics, and automation critical at scale (Stage 9)?
?
Can’t manually check 1000 servers. Logging: Centralized logging (ELK, Splunk) for debugging. Metrics: Monitor health (CPU, memory), application metrics (QPS, latency, error rate), business metrics (signups, revenue). Tools: Prometheus, Grafana, Datadog. Automation: Auto-scaling, CI/CD, Infrastructure as Code (Terraform).

When should you shard (partition) the database (Stage 10)?
?
Last resort! When: Vertical scaling exhausted (can’t add more RAM/CPU to single machine), replication not enough (write load too high), single database can’t handle load (usually > 100K QPS). Sharding distributes data across multiple databases.

What are the three main database sharding strategies?
?
Hash-based: shard = hash(userId) % num_shards. Pros: Even distribution. Cons: Hard to add shards (resharding). Range-based: Shard 1: A-F, Shard 2: G-M, etc. Pros: Easy range queries. Cons: Uneven distribution (hotspots). Geographic: US users to US shard, EU to EU shard. Pros: Low latency, compliance. Cons: Uneven distribution.

What are the challenges of database sharding?
?
Cross-shard queries (joins are hard, need application-level joins), resharding (when shard outgrows capacity, need to redistribute data), celebrity problem (one user has millions of followers, creates hotspot on one shard), increased complexity (multiple databases to manage).

What is SQL vs NoSQL and when to use each?
?
SQL: Structured data, ACID transactions, easy joins, complex queries. Use for: Structured data with relationships, need transactions, consistency critical. NoSQL: Flexible schema, horizontal scaling, high write throughput, eventual consistency. Use for: Flexible schema, need horizontal scaling, simple queries (key-value), high writes. Start with SQL unless specific NoSQL needs.

What is vertical scaling vs horizontal scaling?
?
Vertical (Scale Up): Add more CPU, RAM, disk to single machine. Pros: Simple, no code changes. Cons: Hardware limits, expensive, single point of failure. Horizontal (Scale Out): Add more machines. Pros: Unlimited scaling, cost-effective, redundancy. Cons: More complex, need load balancing. At scale, always choose horizontal!

What is the key principle of the scaling journey?
?
Scale incrementally - add complexity only when needed! Start with single server (MVP), add components as you hit limits. Don’t over-engineer from the start. Every component adds operational complexity, so only add when there’s a clear need.

At what scale would you typically add each component?
?
Load balancer: 1K-10K users. Database replication: 10K-100K users. Cache layer: 100K-1M users. CDN: 500K+ users. Stateless web tier: 1M+ users. Multiple data centers: 5M+ users. Message queue: 10M+ users. Database sharding: 50M+ users. These are rough guidelines!

What makes a good sharding key?
?
Evenly distributes data (no hotspots), supports common query patterns (don’t need cross-shard queries), unchangeable (can’t reshard easily if key changes), high cardinality (many unique values). Common: userId, geographyId. Avoid: Status, category (low cardinality, creates hotspots).

How do you handle a celebrity problem in sharding?
?
One user (celebrity) with millions of followers creates hotspot on one shard. Solutions: Shard celebrities separately (manual partitioning), use hybrid approach (fan-out on read for celebrities instead of fan-out on write), cache celebrity content aggressively, rate limiting on celebrity actions.

What is the difference between replication and sharding?
?
Replication: Copy same data to multiple servers. Purpose: Redundancy, read performance. All replicas have full dataset. Sharding: Distribute different data to different servers. Purpose: Scale write load, handle more data than fits on one server. Each shard has subset of data. Often use both together!

Total Cards: 25
Review Time: 15-20 minutes
Priority: HIGH - Fundamental scaling concepts
Last Updated: 2026-04-08