Key Concepts - Quick Reference

All 12 chapters complete. Use this as a cross-chapter index.

Chapter 1: Reliable, Scalable, Maintainable Applications

→ Full notes: chapters/ch01-reliable-scalable-maintainable.md
→ Cheatsheet: chapters/ch01-cheatsheet.md
→ Flashcards: flashcards/ch01-flashcards.md

Concept	One-Liner
Reliability	Work correctly despite faults (hardware, software, human)
Scalability	Cope with increased load; use percentiles (p99) not averages
Maintainability	Operability + Simplicity + Evolvability
Fault vs Failure	Component deviation vs system-wide breakdown
SLO/SLA	Define expected performance; p99 < 500ms etc.

Chapter 2: Data Models and Query Languages

→ Full notes: chapters/ch02-data-models-query-languages.md
→ Cheatsheet: chapters/ch02-cheatsheet.md
→ Flashcards: flashcards/ch02-flashcards.md

Concept	One-Liner
Relational model	Tables + rows + foreign keys; joins handle relationships
Document model	Self-contained JSON/BSON; great for tree-shaped data
Graph model	Vertices + edges; ideal for highly connected data
Schema-on-write	DB enforces structure at write time
Schema-on-read	Structure interpreted at read time (document DBs)
Impedance mismatch	Gap between OOP objects and relational tables

Chapter 3: Storage and Retrieval

→ Full notes: chapters/ch03-storage-and-retrieval.md
→ Cheatsheet: chapters/ch03-cheatsheet.md
→ Flashcards: flashcards/ch03-flashcards.md

Concept	One-Liner
LSM-Tree	Append writes to memtable → SSTables → compaction
B-Tree	Balanced page tree; update in-place; standard for OLTP
Bloom filter	Probabilistic skip of SSTables that don’t contain a key
OLTP	Transactional: small fast reads/writes by key
OLAP	Analytical: aggregate over millions of rows
Column store	Data stored column-by-column; optimal for analytics
Write amplification	One logical write → multiple physical writes

Chapter 4: Encoding and Evolution

→ Full notes: chapters/ch04-encoding-and-evolution.md
→ Cheatsheet: chapters/ch04-cheatsheet.md
→ Flashcards: flashcards/ch04-flashcards.md

Concept	One-Liner
Backward compatibility	New code reads data written by old code
Forward compatibility	Old code reads data written by new code
Protobuf field tag	Numeric ID for a field; must never change
Avro	No field tags; schema resolution by field name
Schema registry	Central store of versioned schemas
gRPC	Protocol Buffers over HTTP/2; internal RPC standard

Chapter 5: Replication

→ Full notes: chapters/ch05-replication.md
→ Cheatsheet: chapters/ch05-cheatsheet.md
→ Flashcards: flashcards/ch05-flashcards.md

Concept	One-Liner
Single-leader	One node writes; followers replicate
Multi-leader	Multiple writers; async sync; write conflicts possible
Leaderless	All nodes accept writes; quorum for consistency
Replication lag	Delay between leader write and follower update
Read-your-writes	Guarantee user sees their own writes
Monotonic reads	No time reversal across replica reads
CRDT	Conflict-free Replicated Data Type; auto-merge
Quorum (w+r>n)	Guarantees at least one up-to-date read

Chapter 6: Partitioning

→ Full notes: chapters/ch06-partitioning.md
→ Cheatsheet: chapters/ch06-cheatsheet.md
→ Flashcards: flashcards/ch06-flashcards.md

Concept	One-Liner
Key-range partition	Adjacent keys together; range queries efficient; hot spot risk
Hash partition	Uniform distribution; no range queries
Hot spot	One partition receives disproportionate load
Consistent hashing	Ring-based; adding nodes only moves adjacent keys
Scatter/gather	Query all partitions; tail latency = slowest partition
Document-partitioned index	Local secondary index; fast writes, scatter/gather reads
Term-partitioned index	Global secondary index; efficient reads, cross-partition writes

Chapter 7: Transactions

→ Full notes: chapters/ch07-transactions.md
→ Cheatsheet: chapters/ch07-cheatsheet.md
→ Flashcards: flashcards/ch07-flashcards.md

Concept	One-Liner
ACID	Atomicity, Consistency, Isolation, Durability
Read committed	Default isolation; no dirty reads/writes
Snapshot isolation	MVCC; each txn sees consistent snapshot
Serializable (2PL)	Writers block readers; prevents all anomalies
Serializable (SSI)	Optimistic; detect conflicts at commit; used by PostgreSQL
Write skew	Two txns read same data, write to different rows, violate invariant
Lost update	Read-modify-write without atomic operation; one write lost
2PC	Two-Phase Commit; atomic across multiple nodes

Chapter 8: The Trouble with Distributed Systems

→ Full notes: chapters/ch08-trouble-with-distributed-systems.md
→ Cheatsheet: chapters/ch08-cheatsheet.md
→ Flashcards: flashcards/ch08-flashcards.md

Concept	One-Liner
Partial failure	Some nodes work, some don’t; normal in distributed systems
Timeout ambiguity	After timeout, can’t know if request was received/processed
Wall clock danger	Can go backward (NTP sync); unsafe for ordering
Monotonic clock	Always forward; safe for elapsed time within a process
GC pause	JVM stop-the-world pauses process for seconds
Fencing token	Monotonically increasing number; rejects stale writes
FLP impossibility	Consensus impossible in async network if any node can fail

Chapter 9: Consistency and Consensus

→ Full notes: chapters/ch09-consistency-and-consensus.md
→ Cheatsheet: chapters/ch09-cheatsheet.md
→ Flashcards: flashcards/ch09-flashcards.md

Concept	One-Liner
Linearizability	Strongest; reads always see most recent write
Causal consistency	Causally related events ordered; concurrent may differ
CAP theorem	During partition: choose consistency OR availability
PACELC	CAP + latency vs consistency even without partitions
Consensus	Nodes agree on single value; irreversible
Raft	Leader-based consensus; majority quorum; designed for understandability
Total order broadcast	All nodes see all messages in same order; equivalent to consensus
ZooKeeper/etcd	Coordination services built on consensus

Chapter 10: Batch Processing

→ Full notes: chapters/ch10-batch-processing.md
→ Cheatsheet: chapters/ch10-cheatsheet.md
→ Flashcards: flashcards/ch10-flashcards.md

Concept	One-Liner
MapReduce	Map (KV pairs) + Shuffle (group by key) + Reduce (aggregate)
Shuffle	Redistributes mapper output by key; most expensive phase
Dataflow engine	DAG-based batch execution; Spark/Flink; avoids disk between stages
Broadcast hash join	Small table in memory; join without shuffle
Derived data	Any output recomputable from immutable source
BSP model	Bulk Synchronous Parallel; iterative graph algorithms (Pregel)
dbt	SQL-based batch transformation tool; standard for analytics engineering

Chapter 11: Stream Processing

→ Full notes: chapters/ch11-stream-processing.md
→ Cheatsheet: chapters/ch11-cheatsheet.md
→ Flashcards: flashcards/ch11-flashcards.md

Concept	One-Liner
Stream processing	Batch processing over unbounded (infinite) data
Message log (Kafka)	Append-only; consumers track own offset; replay supported
CDC	Change Data Capture; stream DB changes from WAL/binlog
Event sourcing	Store events not state; state = fold over event log
Event time	When event occurred; use for business logic
Processing time	When processor received event; use for SLA monitoring
Watermark	Estimate of event time progress; closes time windows
Exactly-once	Each event processed once despite failures

Chapter 12: The Future of Data Systems

→ Full notes: chapters/ch12-future-of-data-systems.md
→ Cheatsheet: chapters/ch12-cheatsheet.md
→ Flashcards: flashcards/ch12-flashcards.md

Concept	One-Liner
Unbundling	Use specialized tools; compose via event log
Derived data	All state computable from event log; re-derivable if lost
Lambda architecture	Batch + speed layers; antipattern (dual code paths)
Kappa architecture	Single stream processor; replay log for backfill
Data mesh	Domain teams own data as products; federated governance
Data minimalism	Collect only what you need; less data = less risk
Cryptographic erasure	GDPR compliance: encrypt data, delete key = functionally erased

Cross-Chapter Themes

Consistency Hierarchy (weakest → strongest)

Eventual → Monotonic → Causal → Snapshot → Serializable → Linearizable

Distributed System Design Principles

Immutable inputs (Ch10, Ch11, Ch12): Never modify source data; derive all outputs
Derived data (Ch10-12): Any derived view can be regenerated from the event log
Eventual consistency is a spectrum (Ch5, Ch9): Design for the level you actually need
Fencing tokens for leases (Ch8): Protect against stale lock holders
Event log as backbone (Ch11, Ch12): Kafka as the integration nervous system

Key Trade-offs (for interviews)

Trade-off	Option A	Option B
Write optimization	LSM-Tree (Ch3)	B-Tree (Ch3)
Consistency vs Availability	CP (ZooKeeper)	AP (Cassandra)
Strong vs eventual consistency	Linearizable (cost)	Eventual (performance)
Replication: sync vs async	No data loss	High throughput
Secondary index: local vs global	Fast writes	Fast reads
Isolation: SSI vs 2PL	Low contention fast	High contention safe

Status: All 12 chapters complete
Last Updated: 2026-04-13

Study Notes by Niladri & AI

Explorer

key-concepts