Storage Systems: Cross-Chapter Comparison Reference
comparisons storage database sql nosql
Type: Cross-chapter reference — NOT a chapter notes file
Covers: Vol 1 Ch01, Ch06, Ch08, Ch12, Ch13, Ch14, Ch15 · Vol 2 Ch05, Ch06, Ch07, Ch09, Ch10, Ch11, Ch13
Last Updated: 2026-04-13
1. Storage Type Overview
| Type | Representative systems | 1-line description | Best use case |
|---|---|---|---|
| Relational DB | MySQL, PostgreSQL, Oracle | Structured tables with ACID transactions and joins | User accounts, orders, financial ledgers, anything needing strong consistency + complex queries |
| Document DB | MongoDB, Couchbase, DynamoDB (doc mode) | Schema-flexible JSON documents, nested structures | User profiles, product catalogs, content management — flexible evolving schema |
| Wide-column DB | Cassandra, HBase, ScyllaDB | Rows with dynamic columns; optimized for time-series and high write throughput | Chat messages, activity feeds, time-series metrics, IoT events |
| Key-value store | Redis, DynamoDB (KV mode), Memcached | Pure key → value lookups, sub-millisecond latency | Sessions, caches, rate limit counters, feature flags |
| Time-series DB | InfluxDB, TimescaleDB, Prometheus | Optimized for timestamped data; built-in downsampling | Infrastructure metrics, IoT sensors, financial tick data |
| Graph DB | Neo4j, Amazon Neptune, JanusGraph | Nodes and edges; optimized for traversal queries | Social graphs, fraud detection, recommendation engines, knowledge graphs |
| Object storage | AWS S3, Google Cloud Storage, Azure Blob | Flat key-based storage for large blobs, accessed via HTTP | Videos, images, backups, data lake, large file downloads |
| Block storage | AWS EBS, GCP Persistent Disk | Raw blocks attached to VMs, like a virtual disk | OS volumes, database data files, low-latency VM storage |
| Search engine | Elasticsearch, OpenSearch, Solr | Inverted index for full-text search; faceting and ranking | Product search, log analytics, autocomplete, content discovery |
| Data warehouse | Snowflake, BigQuery, Redshift, ClickHouse | Columnar OLAP; optimized for analytical queries on large historical datasets | BI dashboards, ad hoc analytics, business reporting |
2. SQL vs NoSQL Decision Guide
Choose SQL (Relational) when:
- Data has clear relationships (foreign keys, joins matter)
- You need ACID transactions (bank transfers, e-commerce orders, hotel reservations)
- Schema is stable and well-defined upfront
- Complex queries (multi-table joins, aggregations, reporting)
- Data volume fits one powerful server or a few replicas
- Consistency is more important than availability
Typical SDI examples: User accounts (everywhere), hotel reservations (Vol 2 Ch07), payment ledger (Vol 2 Ch11), order history (Vol 2 Ch07)
Choose NoSQL when:
- Schema is flexible or evolving rapidly
- You need horizontal write scalability beyond what one DB can handle
- Queries are simple (key lookups, range scans by primary key)
- Eventual consistency is acceptable
- High write throughput is required (>10K writes/sec)
- Time-series or append-only data patterns
Typical SDI examples: Chat messages (Cassandra — Vol 1 Ch12), metrics (InfluxDB/Cassandra — Vol 2 Ch05), video metadata analytics (Cassandra — Vol 1 Ch14)
Hybrid approaches (most large systems):
- SQL for user accounts + metadata (strong consistency needed)
- NoSQL for activity data + events (high volume, eventual consistency OK)
- Redis as cache layer in front of both
- Object storage for binary blobs (videos, files)
3. NoSQL Options Comparison Table
| Feature | Cassandra | DynamoDB | MongoDB | Redis | HBase |
|---|---|---|---|---|---|
| Data model | Wide-column (partition key + clustering columns) | Key-value / document hybrid | JSON documents | Key-value + rich structures (lists, sets, sorted sets) | Wide-column (Hadoop-backed) |
| Query flexibility | Low — must design table for query patterns | Low — query only by PK or GSI | High — rich query language, aggregation pipeline | Low — key lookups + range within sorted sets | Low — row key + column family scans |
| Consistency model | Tunable (ONE, QUORUM, ALL) | Strong or eventual (configurable per op) | Tunable (read concern) | Strong on single node; eventual on cluster | Strong (via HBase MVCC) |
| Write throughput | Very high (LSM tree, no leader) | Very high (managed, serverless scale) | Medium-high | Extremely high (in-memory) | Very high |
| Read latency | ~1–5ms (disk-backed) | ~1–5ms (managed SSD) | ~1–10ms | Sub-millisecond (in-memory) | ~1–10ms |
| Horizontal scale | Yes — consistent hashing, no master | Yes — fully managed partitioning | Yes (sharding) | Yes (Redis Cluster) | Yes (HDFS-backed) |
| Replication | Configurable replication factor (RF=3 typical) | Managed multi-AZ | Replica sets | Master-replica + Sentinel / Cluster | Managed by HBase |
| ACID transactions | Lightweight transactions (LWT) only | Single-item; multi-item transactions added | Multi-doc transactions (4.0+) | Single-command atomic; Lua scripts for multi-step | No |
| Best for | Time-ordered data, chat history, events, metrics | Serverless / AWS-native, user sessions, game state | Flexible schema, content, catalogs | Caching, leaderboards, session store, counters | Hadoop ecosystem, analytics over HDFS |
| SDI chapters | Chat (Ch12), Metrics (V2 Ch05), YouTube analytics | Hotel booking (V2 Ch07), Gaming leaderboard (V2 Ch10) | Google Maps POI (V2 Ch03) | Everywhere as cache; leaderboard (V2 Ch10) | Chat (Ch12 — HBase as alternative to Cassandra) |
4. Which Storage Does Each SDI System Use and Why
| Chapter | Primary DB | Secondary storage | Cache | Object storage | Key design decision |
|---|---|---|---|---|---|
| Vol 1 Ch08 — URL Shortener | MySQL (shortCode → longURL) | — | Redis (hot redirects) | — | Relational for uniqueness constraints + simple schema; Redis absorbs 80% of reads |
| Vol 1 Ch10 — Notification System | MySQL (user prefs, device tokens) | — | Redis (device token cache) | — | Relational for structured user data; notification state in DB for deduplication |
| Vol 1 Ch11 — News Feed | MySQL (users, friendships) | Cassandra / Redis (post storage, timelines) | Redis (prebuilt feed per user) | S3 (images, videos in posts) | Social graph in SQL; high-volume post fan-out in Redis/Cassandra |
| Vol 1 Ch12 — Chat System | MySQL / PostgreSQL (user accounts, groups) | Cassandra / HBase (message history) | Redis (presence, last-seen, recent messages) | — | Time-ordered messages → wide-column DB; presence ephemeral in Redis |
| Vol 1 Ch13 — Search Autocomplete | MySQL (search log analytics) | Trie / prefix cache (in-memory or Redis) | Redis (top-K suggestions per prefix) | — | Trie structure for prefix matching; Redis caches hot prefixes |
| Vol 1 Ch14 — YouTube | MySQL (video metadata, user data) | Cassandra (view counts, analytics) | Redis (hot metadata) CDN (video segments) | S3 (raw + transcoded video files) | Binary blobs to S3; structured metadata to MySQL; high-write analytics to Cassandra |
| Vol 1 Ch15 — Google Drive | MySQL (file metadata, user data, permissions) | Block storage (file chunks) | Redis (metadata cache, upload session state) | S3 / GCS (file chunks / blocks) | Files split into blocks; metadata relational; strong consistency for metadata |
| Vol 2 Ch03 — Google Maps | PostgreSQL + PostGIS (map geometry) | MongoDB / custom tile store | CDN (map tiles) | S3 (satellite imagery, tiles) | Geospatial queries need PostGIS; tiles are static blobs for CDN |
| Vol 2 Ch05 — Metrics Monitoring | InfluxDB / TimescaleDB / Cassandra (metric time-series) | — | Redis (recent aggregates for dashboards) | S3 (cold metric archive) | Write-heavy time-series → dedicated TSDB; downsampling over time |
| Vol 2 Ch06 — Ad Click Aggregation | MySQL (ad definitions, billing records) | Cassandra / ClickHouse (aggregated click data) | Redis (real-time counters, top-100 sorted set) | S3 (raw click event archive) | Raw events in Kafka; streaming aggregation to Redis; batch to ClickHouse for accuracy |
| Vol 2 Ch07 — Hotel Reservation | MySQL (hotels, rooms, reservations) | Redis (room inventory cache) | Redis (availability cache) | — | ACID critical (double booking = disaster); relational with optimistic locking |
| Vol 2 Ch09 — S3 Object Storage | SQL (metadata: bucket/object manifest) | Custom distributed object store (the chapter IS about this) | — | Itself (the system being designed) | Separates metadata (relational) from data (custom shard-based blob store) |
| Vol 2 Ch10 — Gaming Leaderboard | MySQL (user profiles, game records) | Redis Sorted Set (live leaderboard) | Redis is both cache and primary store for leaderboard | — | Redis ZADD/ZRANK are O(log N); Redis IS the leaderboard store |
| Vol 2 Ch11 — Payment System | MySQL / PostgreSQL (accounts, transactions) | — | Redis (idempotency key store) | — | ACID mandatory; double-entry ledger in relational DB; idempotency keys in Redis |
| Vol 2 Ch12 — Digital Wallet | MySQL / PostgreSQL (wallet balances, transaction log) | — | Redis (balance cache, idempotency) | — | Strong consistency for balance; event sourcing for audit trail |
| Vol 2 Ch13 — Stock Exchange | MySQL (user accounts, order records) | Kafka (event sourcing / audit log) | Redis (order book snapshots, market data) | — | Matching engine state in Redis for microsecond access; Kafka as immutable event log |
5. Database Scaling Patterns
Read Replicas
- Route all reads to replicas, writes to primary
- Replication lag = eventual consistency on reads
- SDI use: Almost every chapter (URL shortener, news feed, notifications)
- Rule of thumb: Use when read:write ratio > 5:1
Sharding Strategies
| Strategy | How it works | Pros | Cons | SDI chapters |
|---|---|---|---|---|
| Range-based | Shard by value range (user IDs 0-1M → shard 1) | Easy range queries | Hotspots if data is skewed | Web crawler (crawl by URL lexicographic range) |
| Hash-based | shard = hash(key) % N | Even distribution | No range queries; resharding requires moving all data | URL shortener (by hash of shortCode) |
| Consistent hashing | Hash ring; each server owns an arc | Adding/removing servers moves minimal data | More complex; needs virtual nodes for balance | Key-value store (Ch06), distributed cache (Ch05) |
| Geographic | Route users to region-local shard | Low latency; data residency compliance | Cross-region queries fail; uneven distribution | Google Maps (region-based tiles), Hotel reservation (region-based hotels) |
| Directory-based | Lookup service maps key → shard | Flexible, no re-hashing | Lookup service is a bottleneck and SPOF | Large-scale systems with complex routing |
Vertical vs Horizontal Scaling
| Aspect | Vertical (scale up) | Horizontal (scale out) |
|---|---|---|
| Mechanism | Bigger machine (more CPU, RAM, disk) | More machines (add nodes) |
| Limit | Hard ceiling (~several hundred GB RAM) | Virtually unlimited |
| Cost | Expensive per unit | Cheaper at scale |
| Complexity | Simple (no distributed problems) | High (sharding, consistency, resharding) |
| When to use | First step — always exhaust vertical first | When vertical is maxed or cost-prohibitive |
| SDI threshold | Start vertical up to ~10K QPS | Shard when single DB can’t keep up |
6. Consistency Models Recap
| Model | What it guarantees | Latency trade-off | When it appears in SDI |
|---|---|---|---|
| Strong consistency | Read always returns the latest write | Higher (must coordinate across replicas) | Payment (Ch11), Hotel reservation (Ch07), Stock exchange (Ch13) — financial data cannot be stale |
| Sequential consistency | All operations appear to happen in some sequential order consistent across all nodes | Medium | Distributed KV stores (Ch06) — order matters for correctness |
| Causal consistency | Causally related operations seen in causal order; unrelated operations may differ | Medium | Chat systems — message B sent in reply to A must arrive after A |
| Read-your-writes | After a write, same client’s reads reflect that write | Medium | User profile updates — user expects to see their own edits immediately |
| Eventual consistency | All replicas converge eventually if writes stop | Low | News feed (Ch11), metrics (V2 Ch05), URL shortener (Ch08) — slight staleness acceptable |
| Monotonic reads | Once a value is read, subsequent reads never return an older value | Medium | Feed loading — don’t let users see “older” posts appear after newer ones |
7. Storage Estimation Quick Reference
| Data type | Typical size | Notes |
|---|---|---|
| UUID / GUID | 16 bytes | Preferred over auto-increment for distributed systems |
| Int (32-bit) | 4 bytes | User ID, counter |
| Long (64-bit) | 8 bytes | Timestamp, large ID |
| URL (average) | ~100 bytes | Short URL ~20 bytes, long URL ~200 bytes |
| Tweet / short message | ~280 bytes (chars) | ~500 bytes with metadata |
| Chat message | ~100–500 bytes | Text + metadata |
| User profile row | ~1 KB | All fields combined |
| Small image (thumbnail) | ~100 KB | Profile photo, thumbnail |
| Image (full resolution) | ~1–5 MB | Photos |
| Video (1 min, 720p) | ~100 MB | H.264 encoded |
| Video (1 hr, 1080p) | ~5–10 GB | Before CDN chunking |
| Metric data point | ~100 bytes | Timestamp + value + labels |
| Log entry | ~200–500 bytes | Level + message + context |
Back-of-envelope formula:
Storage needed = daily_writes × avg_entry_size × retention_days × replication_factor
Example — Chat system (Vol 1 Ch12):
50M DAU × 40 messages/day × 200 bytes × 365 days × 3 replicas
= 50M × 40 × 200 × 365 × 3
≈ 438 TB / year
→ Need sharded Cassandra cluster
8. CAP Theorem and SDI Chapter Mapping
CAP theorem: A distributed data store can only guarantee two of three properties simultaneously:
- Consistency: Every read returns the latest write (or an error)
- Availability: Every request returns a response (may be stale)
- Partition tolerance: System continues operating despite network partitions
Network partitions are unavoidable in distributed systems → you always have P. The real choice is CP vs AP.
| System | CAP choice | Reasoning | SDI chapter |
|---|---|---|---|
| MySQL (relational DB) | CP | Returns error rather than stale data; failover can cause brief unavailability | Payment (Ch11), Hotel reservation (V2 Ch07), stock exchange (V2 Ch13) |
| Cassandra | AP | Tunable; default is AP (eventual consistency); can configure CP with QUORUM | Chat history (Ch12), metrics (V2 Ch05) |
| DynamoDB | AP (default) / CP (optional) | Eventual consistency reads by default; strong consistency available | Hotel booking (V2 Ch07), gaming leaderboard (V2 Ch10) |
| HBase | CP | Strong consistency for row-level reads/writes | Chat system (Ch12 — HBase alternative) |
| Redis | CP (single node) / AP (Cluster) | Single node: strong; Cluster in split-brain: may serve stale | Rate limiter (Ch04), leaderboard (V2 Ch10) |
| ZooKeeper | CP | Used for leader election, config; guarantees linearizability | Key-value store (Ch06), service registry |
Practical SDI interview guidance:
- Financial systems (payment, stock exchange, hotel reservation): Always choose CP. “I would rather fail the request than process a stale balance.”
- Social/analytics systems (news feed, metrics, leaderboard): AP is fine. “A slightly stale feed is acceptable; availability matters more.”
- The real world uses tunable consistency (Cassandra
QUORUM, DynamoDB strong reads) to get the level needed per operation.
9. Storage System Selection Flowchart
What kind of data are you storing?
│
├── Binary blobs (video, images, files > 1 MB)
│ → Object storage (S3, GCS)
│ + CDN in front for reads
│
├── Structured relational data (users, orders, reservations)
│ → Relational DB (MySQL / PostgreSQL)
│ + Read replicas for scale
│ + Redis cache for hot reads
│
├── Time-ordered events (messages, metrics, logs, clicks)
│ → Wide-column DB (Cassandra / HBase)
│ or Time-series DB (InfluxDB / TimescaleDB) for metrics
│
├── Simple key-value (sessions, tokens, counters, feature flags)
│ → Redis (in-memory, sub-ms)
│ or DynamoDB (persistent, serverless)
│
├── Full-text searchable content (products, articles, logs)
│ → Elasticsearch / OpenSearch
│ (sync from primary DB via CDC or dual-write)
│
├── Graph relationships (social network, fraud detection)
│ → Graph DB (Neo4j, Neptune)
│ or simulate with relational adjacency table at small scale
│
└── Flexible / evolving schema (catalog, CMS, user-generated content)
→ Document DB (MongoDB / DynamoDB document mode)
Layered storage architecture (appears in most complex SDI chapters):
[Hot path — sub-ms] Redis (in-memory cache)
↑ cache miss
[Warm path — 1-10ms] Primary DB (MySQL / Cassandra)
↑ cold data
[Cold path — 10-100ms] Object storage / Data warehouse (S3 / ClickHouse)
↑ batch analytics
[Archive — minutes] Glacier / cheap object storage (7-year retention)
See also:
- key-patterns > 4. SQL vs NoSQL — Decision guide for SQL vs NoSQL
- key-patterns > 2. Database Sharding — Sharding strategies detail
- key-patterns > 3. Consistent Hashing — Hash ring pattern
- distributed-system-components > 13. Databases — SQL and NoSQL overview
- distributed-system-components > 16. Object Storage — S3 and blob storage
- estimation-cheatsheet — Storage and throughput numbers
- ch04-distributed-message-queue — Storage for message persistence
- ch09-s3-object-storage — Object storage internals