Key Concepts - Quick Reference
All 12 chapters complete. Use this as a cross-chapter index.
Chapter 1: Reliable, Scalable, Maintainable Applications
→ Full notes: chapters/ch01-reliable-scalable-maintainable.md
→ Cheatsheet: chapters/ch01-cheatsheet.md
→ Flashcards: flashcards/ch01-flashcards.md
| Concept | One-Liner |
|---|---|
| Reliability | Work correctly despite faults (hardware, software, human) |
| Scalability | Cope with increased load; use percentiles (p99) not averages |
| Maintainability | Operability + Simplicity + Evolvability |
| Fault vs Failure | Component deviation vs system-wide breakdown |
| SLO/SLA | Define expected performance; p99 < 500ms etc. |
Chapter 2: Data Models and Query Languages
→ Full notes: chapters/ch02-data-models-query-languages.md
→ Cheatsheet: chapters/ch02-cheatsheet.md
→ Flashcards: flashcards/ch02-flashcards.md
| Concept | One-Liner |
|---|---|
| Relational model | Tables + rows + foreign keys; joins handle relationships |
| Document model | Self-contained JSON/BSON; great for tree-shaped data |
| Graph model | Vertices + edges; ideal for highly connected data |
| Schema-on-write | DB enforces structure at write time |
| Schema-on-read | Structure interpreted at read time (document DBs) |
| Impedance mismatch | Gap between OOP objects and relational tables |
Chapter 3: Storage and Retrieval
→ Full notes: chapters/ch03-storage-and-retrieval.md
→ Cheatsheet: chapters/ch03-cheatsheet.md
→ Flashcards: flashcards/ch03-flashcards.md
| Concept | One-Liner |
|---|---|
| LSM-Tree | Append writes to memtable → SSTables → compaction |
| B-Tree | Balanced page tree; update in-place; standard for OLTP |
| Bloom filter | Probabilistic skip of SSTables that don’t contain a key |
| OLTP | Transactional: small fast reads/writes by key |
| OLAP | Analytical: aggregate over millions of rows |
| Column store | Data stored column-by-column; optimal for analytics |
| Write amplification | One logical write → multiple physical writes |
Chapter 4: Encoding and Evolution
→ Full notes: chapters/ch04-encoding-and-evolution.md
→ Cheatsheet: chapters/ch04-cheatsheet.md
→ Flashcards: flashcards/ch04-flashcards.md
| Concept | One-Liner |
|---|---|
| Backward compatibility | New code reads data written by old code |
| Forward compatibility | Old code reads data written by new code |
| Protobuf field tag | Numeric ID for a field; must never change |
| Avro | No field tags; schema resolution by field name |
| Schema registry | Central store of versioned schemas |
| gRPC | Protocol Buffers over HTTP/2; internal RPC standard |
Chapter 5: Replication
→ Full notes: chapters/ch05-replication.md
→ Cheatsheet: chapters/ch05-cheatsheet.md
→ Flashcards: flashcards/ch05-flashcards.md
| Concept | One-Liner |
|---|---|
| Single-leader | One node writes; followers replicate |
| Multi-leader | Multiple writers; async sync; write conflicts possible |
| Leaderless | All nodes accept writes; quorum for consistency |
| Replication lag | Delay between leader write and follower update |
| Read-your-writes | Guarantee user sees their own writes |
| Monotonic reads | No time reversal across replica reads |
| CRDT | Conflict-free Replicated Data Type; auto-merge |
| Quorum (w+r>n) | Guarantees at least one up-to-date read |
Chapter 6: Partitioning
→ Full notes: chapters/ch06-partitioning.md
→ Cheatsheet: chapters/ch06-cheatsheet.md
→ Flashcards: flashcards/ch06-flashcards.md
| Concept | One-Liner |
|---|---|
| Key-range partition | Adjacent keys together; range queries efficient; hot spot risk |
| Hash partition | Uniform distribution; no range queries |
| Hot spot | One partition receives disproportionate load |
| Consistent hashing | Ring-based; adding nodes only moves adjacent keys |
| Scatter/gather | Query all partitions; tail latency = slowest partition |
| Document-partitioned index | Local secondary index; fast writes, scatter/gather reads |
| Term-partitioned index | Global secondary index; efficient reads, cross-partition writes |
Chapter 7: Transactions
→ Full notes: chapters/ch07-transactions.md
→ Cheatsheet: chapters/ch07-cheatsheet.md
→ Flashcards: flashcards/ch07-flashcards.md
| Concept | One-Liner |
|---|---|
| ACID | Atomicity, Consistency, Isolation, Durability |
| Read committed | Default isolation; no dirty reads/writes |
| Snapshot isolation | MVCC; each txn sees consistent snapshot |
| Serializable (2PL) | Writers block readers; prevents all anomalies |
| Serializable (SSI) | Optimistic; detect conflicts at commit; used by PostgreSQL |
| Write skew | Two txns read same data, write to different rows, violate invariant |
| Lost update | Read-modify-write without atomic operation; one write lost |
| 2PC | Two-Phase Commit; atomic across multiple nodes |
Chapter 8: The Trouble with Distributed Systems
→ Full notes: chapters/ch08-trouble-with-distributed-systems.md
→ Cheatsheet: chapters/ch08-cheatsheet.md
→ Flashcards: flashcards/ch08-flashcards.md
| Concept | One-Liner |
|---|---|
| Partial failure | Some nodes work, some don’t; normal in distributed systems |
| Timeout ambiguity | After timeout, can’t know if request was received/processed |
| Wall clock danger | Can go backward (NTP sync); unsafe for ordering |
| Monotonic clock | Always forward; safe for elapsed time within a process |
| GC pause | JVM stop-the-world pauses process for seconds |
| Fencing token | Monotonically increasing number; rejects stale writes |
| FLP impossibility | Consensus impossible in async network if any node can fail |
Chapter 9: Consistency and Consensus
→ Full notes: chapters/ch09-consistency-and-consensus.md
→ Cheatsheet: chapters/ch09-cheatsheet.md
→ Flashcards: flashcards/ch09-flashcards.md
| Concept | One-Liner |
|---|---|
| Linearizability | Strongest; reads always see most recent write |
| Causal consistency | Causally related events ordered; concurrent may differ |
| CAP theorem | During partition: choose consistency OR availability |
| PACELC | CAP + latency vs consistency even without partitions |
| Consensus | Nodes agree on single value; irreversible |
| Raft | Leader-based consensus; majority quorum; designed for understandability |
| Total order broadcast | All nodes see all messages in same order; equivalent to consensus |
| ZooKeeper/etcd | Coordination services built on consensus |
Chapter 10: Batch Processing
→ Full notes: chapters/ch10-batch-processing.md
→ Cheatsheet: chapters/ch10-cheatsheet.md
→ Flashcards: flashcards/ch10-flashcards.md
| Concept | One-Liner |
|---|---|
| MapReduce | Map (KV pairs) + Shuffle (group by key) + Reduce (aggregate) |
| Shuffle | Redistributes mapper output by key; most expensive phase |
| Dataflow engine | DAG-based batch execution; Spark/Flink; avoids disk between stages |
| Broadcast hash join | Small table in memory; join without shuffle |
| Derived data | Any output recomputable from immutable source |
| BSP model | Bulk Synchronous Parallel; iterative graph algorithms (Pregel) |
| dbt | SQL-based batch transformation tool; standard for analytics engineering |
Chapter 11: Stream Processing
→ Full notes: chapters/ch11-stream-processing.md
→ Cheatsheet: chapters/ch11-cheatsheet.md
→ Flashcards: flashcards/ch11-flashcards.md
| Concept | One-Liner |
|---|---|
| Stream processing | Batch processing over unbounded (infinite) data |
| Message log (Kafka) | Append-only; consumers track own offset; replay supported |
| CDC | Change Data Capture; stream DB changes from WAL/binlog |
| Event sourcing | Store events not state; state = fold over event log |
| Event time | When event occurred; use for business logic |
| Processing time | When processor received event; use for SLA monitoring |
| Watermark | Estimate of event time progress; closes time windows |
| Exactly-once | Each event processed once despite failures |
Chapter 12: The Future of Data Systems
→ Full notes: chapters/ch12-future-of-data-systems.md
→ Cheatsheet: chapters/ch12-cheatsheet.md
→ Flashcards: flashcards/ch12-flashcards.md
| Concept | One-Liner |
|---|---|
| Unbundling | Use specialized tools; compose via event log |
| Derived data | All state computable from event log; re-derivable if lost |
| Lambda architecture | Batch + speed layers; antipattern (dual code paths) |
| Kappa architecture | Single stream processor; replay log for backfill |
| Data mesh | Domain teams own data as products; federated governance |
| Data minimalism | Collect only what you need; less data = less risk |
| Cryptographic erasure | GDPR compliance: encrypt data, delete key = functionally erased |
Cross-Chapter Themes
Consistency Hierarchy (weakest → strongest)
Eventual → Monotonic → Causal → Snapshot → Serializable → Linearizable
Distributed System Design Principles
- Immutable inputs (Ch10, Ch11, Ch12): Never modify source data; derive all outputs
- Derived data (Ch10-12): Any derived view can be regenerated from the event log
- Eventual consistency is a spectrum (Ch5, Ch9): Design for the level you actually need
- Fencing tokens for leases (Ch8): Protect against stale lock holders
- Event log as backbone (Ch11, Ch12): Kafka as the integration nervous system
Key Trade-offs (for interviews)
| Trade-off | Option A | Option B |
|---|---|---|
| Write optimization | LSM-Tree (Ch3) | B-Tree (Ch3) |
| Consistency vs Availability | CP (ZooKeeper) | AP (Cassandra) |
| Strong vs eventual consistency | Linearizable (cost) | Eventual (performance) |
| Replication: sync vs async | No data loss | High throughput |
| Secondary index: local vs global | Fast writes | Fast reads |
| Isolation: SSI vs 2PL | Low contention fast | High contention safe |
Status: All 12 chapters complete
Last Updated: 2026-04-13