Key Concepts - Quick Reference

All 12 chapters complete. Use this as a cross-chapter index.


Chapter 1: Reliable, Scalable, Maintainable Applications

→ Full notes: chapters/ch01-reliable-scalable-maintainable.md
→ Cheatsheet: chapters/ch01-cheatsheet.md
→ Flashcards: flashcards/ch01-flashcards.md

ConceptOne-Liner
ReliabilityWork correctly despite faults (hardware, software, human)
ScalabilityCope with increased load; use percentiles (p99) not averages
MaintainabilityOperability + Simplicity + Evolvability
Fault vs FailureComponent deviation vs system-wide breakdown
SLO/SLADefine expected performance; p99 < 500ms etc.

Chapter 2: Data Models and Query Languages

→ Full notes: chapters/ch02-data-models-query-languages.md
→ Cheatsheet: chapters/ch02-cheatsheet.md
→ Flashcards: flashcards/ch02-flashcards.md

ConceptOne-Liner
Relational modelTables + rows + foreign keys; joins handle relationships
Document modelSelf-contained JSON/BSON; great for tree-shaped data
Graph modelVertices + edges; ideal for highly connected data
Schema-on-writeDB enforces structure at write time
Schema-on-readStructure interpreted at read time (document DBs)
Impedance mismatchGap between OOP objects and relational tables

Chapter 3: Storage and Retrieval

→ Full notes: chapters/ch03-storage-and-retrieval.md
→ Cheatsheet: chapters/ch03-cheatsheet.md
→ Flashcards: flashcards/ch03-flashcards.md

ConceptOne-Liner
LSM-TreeAppend writes to memtable → SSTables → compaction
B-TreeBalanced page tree; update in-place; standard for OLTP
Bloom filterProbabilistic skip of SSTables that don’t contain a key
OLTPTransactional: small fast reads/writes by key
OLAPAnalytical: aggregate over millions of rows
Column storeData stored column-by-column; optimal for analytics
Write amplificationOne logical write → multiple physical writes

Chapter 4: Encoding and Evolution

→ Full notes: chapters/ch04-encoding-and-evolution.md
→ Cheatsheet: chapters/ch04-cheatsheet.md
→ Flashcards: flashcards/ch04-flashcards.md

ConceptOne-Liner
Backward compatibilityNew code reads data written by old code
Forward compatibilityOld code reads data written by new code
Protobuf field tagNumeric ID for a field; must never change
AvroNo field tags; schema resolution by field name
Schema registryCentral store of versioned schemas
gRPCProtocol Buffers over HTTP/2; internal RPC standard

Chapter 5: Replication

→ Full notes: chapters/ch05-replication.md
→ Cheatsheet: chapters/ch05-cheatsheet.md
→ Flashcards: flashcards/ch05-flashcards.md

ConceptOne-Liner
Single-leaderOne node writes; followers replicate
Multi-leaderMultiple writers; async sync; write conflicts possible
LeaderlessAll nodes accept writes; quorum for consistency
Replication lagDelay between leader write and follower update
Read-your-writesGuarantee user sees their own writes
Monotonic readsNo time reversal across replica reads
CRDTConflict-free Replicated Data Type; auto-merge
Quorum (w+r>n)Guarantees at least one up-to-date read

Chapter 6: Partitioning

→ Full notes: chapters/ch06-partitioning.md
→ Cheatsheet: chapters/ch06-cheatsheet.md
→ Flashcards: flashcards/ch06-flashcards.md

ConceptOne-Liner
Key-range partitionAdjacent keys together; range queries efficient; hot spot risk
Hash partitionUniform distribution; no range queries
Hot spotOne partition receives disproportionate load
Consistent hashingRing-based; adding nodes only moves adjacent keys
Scatter/gatherQuery all partitions; tail latency = slowest partition
Document-partitioned indexLocal secondary index; fast writes, scatter/gather reads
Term-partitioned indexGlobal secondary index; efficient reads, cross-partition writes

Chapter 7: Transactions

→ Full notes: chapters/ch07-transactions.md
→ Cheatsheet: chapters/ch07-cheatsheet.md
→ Flashcards: flashcards/ch07-flashcards.md

ConceptOne-Liner
ACIDAtomicity, Consistency, Isolation, Durability
Read committedDefault isolation; no dirty reads/writes
Snapshot isolationMVCC; each txn sees consistent snapshot
Serializable (2PL)Writers block readers; prevents all anomalies
Serializable (SSI)Optimistic; detect conflicts at commit; used by PostgreSQL
Write skewTwo txns read same data, write to different rows, violate invariant
Lost updateRead-modify-write without atomic operation; one write lost
2PCTwo-Phase Commit; atomic across multiple nodes

Chapter 8: The Trouble with Distributed Systems

→ Full notes: chapters/ch08-trouble-with-distributed-systems.md
→ Cheatsheet: chapters/ch08-cheatsheet.md
→ Flashcards: flashcards/ch08-flashcards.md

ConceptOne-Liner
Partial failureSome nodes work, some don’t; normal in distributed systems
Timeout ambiguityAfter timeout, can’t know if request was received/processed
Wall clock dangerCan go backward (NTP sync); unsafe for ordering
Monotonic clockAlways forward; safe for elapsed time within a process
GC pauseJVM stop-the-world pauses process for seconds
Fencing tokenMonotonically increasing number; rejects stale writes
FLP impossibilityConsensus impossible in async network if any node can fail

Chapter 9: Consistency and Consensus

→ Full notes: chapters/ch09-consistency-and-consensus.md
→ Cheatsheet: chapters/ch09-cheatsheet.md
→ Flashcards: flashcards/ch09-flashcards.md

ConceptOne-Liner
LinearizabilityStrongest; reads always see most recent write
Causal consistencyCausally related events ordered; concurrent may differ
CAP theoremDuring partition: choose consistency OR availability
PACELCCAP + latency vs consistency even without partitions
ConsensusNodes agree on single value; irreversible
RaftLeader-based consensus; majority quorum; designed for understandability
Total order broadcastAll nodes see all messages in same order; equivalent to consensus
ZooKeeper/etcdCoordination services built on consensus

Chapter 10: Batch Processing

→ Full notes: chapters/ch10-batch-processing.md
→ Cheatsheet: chapters/ch10-cheatsheet.md
→ Flashcards: flashcards/ch10-flashcards.md

ConceptOne-Liner
MapReduceMap (KV pairs) + Shuffle (group by key) + Reduce (aggregate)
ShuffleRedistributes mapper output by key; most expensive phase
Dataflow engineDAG-based batch execution; Spark/Flink; avoids disk between stages
Broadcast hash joinSmall table in memory; join without shuffle
Derived dataAny output recomputable from immutable source
BSP modelBulk Synchronous Parallel; iterative graph algorithms (Pregel)
dbtSQL-based batch transformation tool; standard for analytics engineering

Chapter 11: Stream Processing

→ Full notes: chapters/ch11-stream-processing.md
→ Cheatsheet: chapters/ch11-cheatsheet.md
→ Flashcards: flashcards/ch11-flashcards.md

ConceptOne-Liner
Stream processingBatch processing over unbounded (infinite) data
Message log (Kafka)Append-only; consumers track own offset; replay supported
CDCChange Data Capture; stream DB changes from WAL/binlog
Event sourcingStore events not state; state = fold over event log
Event timeWhen event occurred; use for business logic
Processing timeWhen processor received event; use for SLA monitoring
WatermarkEstimate of event time progress; closes time windows
Exactly-onceEach event processed once despite failures

Chapter 12: The Future of Data Systems

→ Full notes: chapters/ch12-future-of-data-systems.md
→ Cheatsheet: chapters/ch12-cheatsheet.md
→ Flashcards: flashcards/ch12-flashcards.md

ConceptOne-Liner
UnbundlingUse specialized tools; compose via event log
Derived dataAll state computable from event log; re-derivable if lost
Lambda architectureBatch + speed layers; antipattern (dual code paths)
Kappa architectureSingle stream processor; replay log for backfill
Data meshDomain teams own data as products; federated governance
Data minimalismCollect only what you need; less data = less risk
Cryptographic erasureGDPR compliance: encrypt data, delete key = functionally erased

Cross-Chapter Themes

Consistency Hierarchy (weakest → strongest)

Eventual → Monotonic → Causal → Snapshot → Serializable → Linearizable

Distributed System Design Principles

  1. Immutable inputs (Ch10, Ch11, Ch12): Never modify source data; derive all outputs
  2. Derived data (Ch10-12): Any derived view can be regenerated from the event log
  3. Eventual consistency is a spectrum (Ch5, Ch9): Design for the level you actually need
  4. Fencing tokens for leases (Ch8): Protect against stale lock holders
  5. Event log as backbone (Ch11, Ch12): Kafka as the integration nervous system

Key Trade-offs (for interviews)

Trade-offOption AOption B
Write optimizationLSM-Tree (Ch3)B-Tree (Ch3)
Consistency vs AvailabilityCP (ZooKeeper)AP (Cassandra)
Strong vs eventual consistencyLinearizable (cost)Eventual (performance)
Replication: sync vs asyncNo data lossHigh throughput
Secondary index: local vs globalFast writesFast reads
Isolation: SSI vs 2PLLow contention fastHigh contention safe

Status: All 12 chapters complete
Last Updated: 2026-04-13