Designing Data-Intensive Applications (2nd Edition) — Study Notes
Comprehensive notes for Kleppmann & Riccomini’s DDIA 2nd Edition (2026) — the updated definitive guide to distributed systems theory, data engineering, and modern cloud-native architectures.
Book Details
- Title: Designing Data-Intensive Applications (2nd Edition)
- Author(s): Martin Kleppmann and Chris Riccomini
- Published: February 2026
- Focus: Theory — distributed systems principles, consensus, replication, transactions, and modern cloud/streaming architectures
- Style: Academic with practical depth
- Total Chapters: 14 (vs 12 in 1st edition)
- Notes Status: Complete as of 2026-05-29
What’s New in the 2nd Edition
Major changes from the 1st edition (2017):
| Change | Details |
|---|---|
| New authors | Chris Riccomini joins as co-author |
| 2 new chapters | Ch13: A Philosophy of Streaming Systems; Ch14: Doing the Right Thing (ethics) |
| Ch1 restructured | Now covers trade-offs framework (OLTP/OLAP, cloud vs self-hosting, distributed vs single-node) |
| Ch7 renamed | ”Partitioning” → “Sharding” (reflects industry terminology shift) |
| New topics | Vector embeddings, GraphQL, CQRS/event sourcing, durable execution, formal methods |
| Updated examples | Reflects 2026 landscape: Kafka, Flink, Spark, Snowflake, DynamoDB, etc. |
Chapter List
| # | Chapter Title | Key Concepts | Status |
|---|---|---|---|
| 01 | Trade-Offs in Data Systems Architecture | OLTP/OLAP, Cloud vs Self-Hosting, Microservices | 🟩 |
| 02 | Defining Nonfunctional Requirements | Reliability, Scalability, Maintainability, Percentiles | 🟩 |
| 03 | Data Models and Query Languages | Relational, Document, Graph, GraphQL, Event Sourcing | 🟩 |
| 04 | Storage and Retrieval | LSM-trees, B-trees, Column stores, Vector embeddings | 🟩 |
| 05 | Encoding and Evolution | Protobuf, Avro, REST/RPC, Durable execution | 🟩 |
| 06 | Replication | Single-leader, Multi-leader, Leaderless, CDC | 🟩 |
| 07 | Sharding | Key-range, Hash sharding, Secondary indexes, Rebalancing | 🟩 |
| 08 | Transactions | ACID, Isolation levels, Serializability, 2PL | 🟩 |
| 09 | The Trouble with Distributed Systems | Networks, Clocks, Byzantine faults, Formal methods | 🟩 |
| 10 | Consistency and Consensus | Linearizability, Logical clocks, Paxos/Raft | 🟩 |
| 11 | Batch Processing | MapReduce, Dataflow, ETL, ML pipelines | 🟩 |
| 12 | Stream Processing | Kafka, CDC, Watermarks, Stream joins | 🟩 |
| 13 | A Philosophy of Streaming Systems | Stream-first design, correctness, exactly-once | 🟩 |
| 14 | Doing the Right Thing | Ethics, bias, privacy, surveillance, regulation | 🟩 |
Directory Structure
DDIA-2E-Notes/
├── README.md # This file
│
├── chapters/ # Chapter notes + cheatsheets
│ ├── ch01-tradeoffs-data-systems.md
│ ├── ch01-cheatsheet.md
│ ├── ch02-nonfunctional-requirements.md
│ ├── ch02-cheatsheet.md
│ ├── ch03-data-models-query-languages.md
│ ├── ch03-cheatsheet.md
│ ├── ch04-storage-and-retrieval.md
│ ├── ch04-cheatsheet.md
│ ├── ch05-encoding-and-evolution.md
│ ├── ch05-cheatsheet.md
│ ├── ch06-replication.md
│ ├── ch06-cheatsheet.md
│ ├── ch07-sharding.md
│ ├── ch07-cheatsheet.md
│ ├── ch08-transactions.md
│ ├── ch08-cheatsheet.md
│ ├── ch09-trouble-with-distributed-systems.md
│ ├── ch09-cheatsheet.md
│ ├── ch10-consistency-and-consensus.md
│ ├── ch10-cheatsheet.md
│ ├── ch11-batch-processing.md
│ ├── ch11-cheatsheet.md
│ ├── ch12-stream-processing.md
│ ├── ch12-cheatsheet.md
│ ├── ch13-philosophy-of-streaming.md
│ ├── ch13-cheatsheet.md
│ ├── ch14-doing-the-right-thing.md
│ └── ch14-cheatsheet.md
│
└── flashcards/ # Obsidian spaced repetition flashcards
├── ch01-flashcards.md
├── ch02-flashcards.md
├── ch03-flashcards.md
├── ch04-flashcards.md
├── ch05-flashcards.md
├── ch06-flashcards.md
├── ch07-flashcards.md
├── ch08-flashcards.md
├── ch09-flashcards.md
├── ch10-flashcards.md
├── ch11-flashcards.md
├── ch12-flashcards.md
├── ch13-flashcards.md
└── ch14-flashcards.md
How to Use These Notes
-
Read the chapter notes file (~45-60 min per chapter)
- Understand the core concepts, mechanisms, and why they exist
- Focus on trade-off tables — they summarize the key design decisions
- Use cheatsheet for 5-minute quick review before interviews
-
Review the flashcard file (~15-20 min per chapter)
- Add to Obsidian with Spaced Repetition plugin
- Review daily; focus on HIGH priority chapters first
-
Cross-reference with 1st edition notes
[[DDIA-Notes/chapters/...]]for 1st edition coverage- Compare what’s changed and what’s new
Study Path Recommendations
1-Week Sprint (Interview imminent)
- Day 1: Ch2 (Requirements), Ch8 (Transactions)
- Day 2: Ch6 (Replication), Ch7 (Sharding)
- Day 3: Ch10 (Consensus), Ch9 (Distributed Troubles)
- Day 4: Ch4 (Storage), Ch5 (Encoding)
- Day 5: Ch12 (Streams), Ch11 (Batch)
- Day 6-7: Ch1, Ch3, Ch13, flashcard review
1-Month Plan (Solid preparation)
- Week 1: Ch1-4 (Foundations)
- Week 2: Ch5-8 (Storage & Transactions)
- Week 3: Ch9-12 (Distributed Systems & Processing)
- Week 4: Ch13-14 + review all flashcards
Key Themes
- Trade-offs everywhere: No universal best solution — context determines the right choice
- Failure is normal: Design for failure at hardware, software, network, and human levels
- Consistency vs availability: CAP theorem trade-offs appear in replication, transactions, and consensus
- Data models shape everything: Choice of model (relational, document, graph) constrains what’s possible
- Streams as first-class citizens: The 2nd edition significantly elevates streaming as a fundamental paradigm
Comparison with 1st Edition Notes
| Aspect | DDIA 2E (2026) | DDIA 1E (2017) |
|---|---|---|
| Chapters | 14 | 12 |
| Cloud coverage | Extensive (cloud-native arch) | Limited |
| Streaming | 2 full chapters | 1 chapter |
| Ethics | Full chapter (Ch14) | Brief mention |
| Vector DBs | Covered (Ch4) | Not covered |
| Durable execution | Covered (Ch5) | Not covered |
Last Updated: 2026-05-29