Designing Data-Intensive Applications (2nd Edition) — Study Notes

Comprehensive notes for Kleppmann & Riccomini’s DDIA 2nd Edition (2026) — the updated definitive guide to distributed systems theory, data engineering, and modern cloud-native architectures.

Book Details

  • Title: Designing Data-Intensive Applications (2nd Edition)
  • Author(s): Martin Kleppmann and Chris Riccomini
  • Published: February 2026
  • Focus: Theory — distributed systems principles, consensus, replication, transactions, and modern cloud/streaming architectures
  • Style: Academic with practical depth
  • Total Chapters: 14 (vs 12 in 1st edition)
  • Notes Status: Complete as of 2026-05-29

What’s New in the 2nd Edition

Major changes from the 1st edition (2017):

ChangeDetails
New authorsChris Riccomini joins as co-author
2 new chaptersCh13: A Philosophy of Streaming Systems; Ch14: Doing the Right Thing (ethics)
Ch1 restructuredNow covers trade-offs framework (OLTP/OLAP, cloud vs self-hosting, distributed vs single-node)
Ch7 renamed”Partitioning” → “Sharding” (reflects industry terminology shift)
New topicsVector embeddings, GraphQL, CQRS/event sourcing, durable execution, formal methods
Updated examplesReflects 2026 landscape: Kafka, Flink, Spark, Snowflake, DynamoDB, etc.

Chapter List

#Chapter TitleKey ConceptsStatus
01Trade-Offs in Data Systems ArchitectureOLTP/OLAP, Cloud vs Self-Hosting, Microservices🟩
02Defining Nonfunctional RequirementsReliability, Scalability, Maintainability, Percentiles🟩
03Data Models and Query LanguagesRelational, Document, Graph, GraphQL, Event Sourcing🟩
04Storage and RetrievalLSM-trees, B-trees, Column stores, Vector embeddings🟩
05Encoding and EvolutionProtobuf, Avro, REST/RPC, Durable execution🟩
06ReplicationSingle-leader, Multi-leader, Leaderless, CDC🟩
07ShardingKey-range, Hash sharding, Secondary indexes, Rebalancing🟩
08TransactionsACID, Isolation levels, Serializability, 2PL🟩
09The Trouble with Distributed SystemsNetworks, Clocks, Byzantine faults, Formal methods🟩
10Consistency and ConsensusLinearizability, Logical clocks, Paxos/Raft🟩
11Batch ProcessingMapReduce, Dataflow, ETL, ML pipelines🟩
12Stream ProcessingKafka, CDC, Watermarks, Stream joins🟩
13A Philosophy of Streaming SystemsStream-first design, correctness, exactly-once🟩
14Doing the Right ThingEthics, bias, privacy, surveillance, regulation🟩

Directory Structure

DDIA-2E-Notes/
├── README.md                    # This file
│
├── chapters/                    # Chapter notes + cheatsheets
│   ├── ch01-tradeoffs-data-systems.md
│   ├── ch01-cheatsheet.md
│   ├── ch02-nonfunctional-requirements.md
│   ├── ch02-cheatsheet.md
│   ├── ch03-data-models-query-languages.md
│   ├── ch03-cheatsheet.md
│   ├── ch04-storage-and-retrieval.md
│   ├── ch04-cheatsheet.md
│   ├── ch05-encoding-and-evolution.md
│   ├── ch05-cheatsheet.md
│   ├── ch06-replication.md
│   ├── ch06-cheatsheet.md
│   ├── ch07-sharding.md
│   ├── ch07-cheatsheet.md
│   ├── ch08-transactions.md
│   ├── ch08-cheatsheet.md
│   ├── ch09-trouble-with-distributed-systems.md
│   ├── ch09-cheatsheet.md
│   ├── ch10-consistency-and-consensus.md
│   ├── ch10-cheatsheet.md
│   ├── ch11-batch-processing.md
│   ├── ch11-cheatsheet.md
│   ├── ch12-stream-processing.md
│   ├── ch12-cheatsheet.md
│   ├── ch13-philosophy-of-streaming.md
│   ├── ch13-cheatsheet.md
│   ├── ch14-doing-the-right-thing.md
│   └── ch14-cheatsheet.md
│
└── flashcards/                  # Obsidian spaced repetition flashcards
    ├── ch01-flashcards.md
    ├── ch02-flashcards.md
    ├── ch03-flashcards.md
    ├── ch04-flashcards.md
    ├── ch05-flashcards.md
    ├── ch06-flashcards.md
    ├── ch07-flashcards.md
    ├── ch08-flashcards.md
    ├── ch09-flashcards.md
    ├── ch10-flashcards.md
    ├── ch11-flashcards.md
    ├── ch12-flashcards.md
    ├── ch13-flashcards.md
    └── ch14-flashcards.md

How to Use These Notes

  1. Read the chapter notes file (~45-60 min per chapter)

    • Understand the core concepts, mechanisms, and why they exist
    • Focus on trade-off tables — they summarize the key design decisions
    • Use cheatsheet for 5-minute quick review before interviews
  2. Review the flashcard file (~15-20 min per chapter)

    • Add to Obsidian with Spaced Repetition plugin
    • Review daily; focus on HIGH priority chapters first
  3. Cross-reference with 1st edition notes

    • [[DDIA-Notes/chapters/...]] for 1st edition coverage
    • Compare what’s changed and what’s new

Study Path Recommendations

1-Week Sprint (Interview imminent)

  • Day 1: Ch2 (Requirements), Ch8 (Transactions)
  • Day 2: Ch6 (Replication), Ch7 (Sharding)
  • Day 3: Ch10 (Consensus), Ch9 (Distributed Troubles)
  • Day 4: Ch4 (Storage), Ch5 (Encoding)
  • Day 5: Ch12 (Streams), Ch11 (Batch)
  • Day 6-7: Ch1, Ch3, Ch13, flashcard review

1-Month Plan (Solid preparation)

  • Week 1: Ch1-4 (Foundations)
  • Week 2: Ch5-8 (Storage & Transactions)
  • Week 3: Ch9-12 (Distributed Systems & Processing)
  • Week 4: Ch13-14 + review all flashcards

Key Themes

  1. Trade-offs everywhere: No universal best solution — context determines the right choice
  2. Failure is normal: Design for failure at hardware, software, network, and human levels
  3. Consistency vs availability: CAP theorem trade-offs appear in replication, transactions, and consensus
  4. Data models shape everything: Choice of model (relational, document, graph) constrains what’s possible
  5. Streams as first-class citizens: The 2nd edition significantly elevates streaming as a fundamental paradigm

Comparison with 1st Edition Notes

AspectDDIA 2E (2026)DDIA 1E (2017)
Chapters1412
Cloud coverageExtensive (cloud-native arch)Limited
Streaming2 full chapters1 chapter
EthicsFull chapter (Ch14)Brief mention
Vector DBsCovered (Ch4)Not covered
Durable executionCovered (Ch5)Not covered

Last Updated: 2026-05-29