Chapter 13: A Philosophy of Streaming Systems
ddia-2e streaming philosophy exactly-once correctness kappa-architecture
Status: Notes complete
Overview
Chapter 13 is the most philosophically ambitious chapter in DDIA 2nd Edition. Where Chapter 12 teaches the mechanics of stream processing — how Kafka works, how watermarks are computed, how joins are implemented — this chapter asks the deeper question: why do we build systems this way, and what does it mean for a streaming system to be correct? Kleppmann and Riccomini argue that streaming is not merely an optimization over batch processing but a fundamentally different and more honest model of how data actually exists in the world: as an ordered sequence of events that unfold over time. This framing has profound implications for how we reason about correctness, state, fault tolerance, and the relationship between data and time. The chapter synthesizes the entire second half of the book into a coherent philosophy that guides system design decisions at the highest level.
Key Concepts
Streaming as a Fundamental Paradigm
The conventional mental model treats databases as the primary abstraction and streams as an optimization for low-latency updates. DDIA 2E inverts this: the stream is primary; the database is a derived artifact.
Consider how data actually originates: a user clicks a button, a sensor emits a reading, a payment is processed. Each of these is an event — a discrete occurrence at a specific moment in time. The traditional database forces us to discard the temporal history and store only the current state. The stream preserves the history.
The event log as ground truth: When a streaming system captures events in a durable, ordered log (Kafka, Kinesis, Pulsar), that log becomes the authoritative record of what happened and when. Everything else — database tables, search indexes, materialized views, ML feature stores — is derived from that log. If a derived artifact is lost or corrupted, you rebuild it by replaying the log.
This is not just an architectural pattern; it is a philosophical stance on data: facts are immutable; state is ephemeral. A fact (the event) cannot be un-true. A database row (state) is just the current summary of all facts. This inversion changes how you think about correctness, debugging, and recovery.
Why “everything is streaming”: Flink’s foundational insight, echoed throughout this chapter, is that batch processing is a special case of stream processing — a stream with a finite, known end. This unification eliminates the need for separate batch and streaming code paths (the Lambda architecture problem). A Flink job processing a bounded historical dataset uses the same operators as one processing an unbounded live stream. Time is the only difference: for batch, event time is fully known; for streaming, events are still arriving.
The Lambda Architecture and Its Problems
Before understanding the Kappa architecture fully, it is worth revisiting why Lambda was invented and why it is now considered an antipattern.
Lambda architecture (Nathan Marz, 2011) was a pragmatic response to a real problem: stream processors in 2011 (Storm, early Spark Streaming) could not guarantee exactly-once processing and had limited state management. The solution was to run two parallel pipelines:
- A batch layer (Hadoop/Spark) that recomputes aggregates over all historical data periodically, providing accurate but stale results
- A speed layer (Storm) that processes only recent events to fill in the gap until the next batch run, providing approximate but fresh results
- A serving layer that merges the two to answer queries
Why Lambda fails philosophically:
-
Dual code paths for identical logic: The same computation — say, daily active users — must be implemented twice, once in Spark (batch) and once in Flink/Storm (streaming). These implementations inevitably diverge. A bug fixed in one is not fixed in the other. Testing becomes twice as hard.
-
The serving layer complexity: Merging batch and speed results requires handling temporal overlap (some events processed by both), different precision, and different consistency guarantees. This logic is subtle and failure-prone.
-
It enshrines a workaround as an architecture: Lambda acknowledges the inadequacy of early stream processors instead of fixing the stream processor. By 2026, modern stream processors (Flink, Kafka Streams, Spark Structured Streaming) have exactly-once semantics, rich state management, and the ability to process historical data — there is no longer a reason for Lambda.
The philosophical problem with Lambda: It treats correctness as something you achieve by averaging out two imperfect systems rather than by building one correct system. This is epistemically unsatisfying and practically dangerous.
The Kappa Architecture
Kappa architecture (Jay Kreps, 2014) is the simplification that Lambda was waiting for: one processing layer, one code path, one system to understand.
Event Sources
│
▼
Durable Ordered Log (Kafka/Kinesis)
│
▼
Stream Processor (Flink / Kafka Streams)
├─ Real-time: process new events as they arrive
└─ Historical: replay log from offset 0 to reprocess
│
▼
Serving Layer
├─ Materialized views (RocksDB, Redis)
├─ Search index (Elasticsearch)
└─ OLAP store (ClickHouse, Snowflake)
Historical reprocessing in Kappa: When you need to reprocess historical data — because of a bug fix, a new derived view, or a schema change — you replay the Kafka log from the beginning through the same stream processor code. This requires:
- Long enough Kafka retention (or compacted topics / archival to object storage)
- The stream processor to handle “catching up” efficiently
- A mechanism to cut over from old to new materialized view without downtime
When Kappa is harder than Lambda: If you need very long-term historical reprocessing (years of data) and your stream processor cannot handle the full replay efficiently, Kappa’s catch-up time can be unacceptable. Some organizations use a hybrid: Kafka for recent data, Iceberg/Parquet on object storage for historical, with the same processing logic applied to both. This is “Kappa-with-cold-storage” rather than Lambda.
Kappa in 2026: The shift is complete. Apache Flink’s “unified batch and streaming” model, Kafka Streams, and Spark Structured Streaming all implement Kappa-style architectures. Lambda is now genuinely an antipattern except in legacy environments that cannot yet migrate.
Time in Streaming Systems
The treatment of time is where stream processing diverges most sharply from batch processing, and where the most subtle bugs arise. Understanding time thoroughly is a prerequisite for building correct streaming systems.
Event Time vs. Processing Time
Processing time: The wall clock time when the stream processor receives and processes an event. Simple to obtain (just System.currentTimeMillis()), but reflects system behavior, not the events themselves.
Event time: The timestamp embedded in the event itself, recording when the event actually occurred in the real world. Requires correct clocks on the originating device, which is not guaranteed.
Why the difference matters: In a mobile app, events generated offline are buffered and sent when connectivity is restored. A user who made three purchases at 11:58 PM, 11:59 PM, and 12:01 AM might not have those events arrive at the stream processor until 12:15 AM. If you compute hourly aggregates by processing time, those purchases are in the 12 AM bucket. If you compute by event time, the first two are in the 11 PM bucket, where they belong.
For business reporting — revenue per hour, active users per day — event time is almost always what you want. Processing time is a reasonable proxy only when event latency is small and predictable.
Watermarks: Reasoning About Late Data
The fundamental challenge with event time is that you never know if you have received all the events for a given time window. You could wait forever. Watermarks are the pragmatic solution: a heuristic estimate of the maximum event time you will ever receive, formalized as a guarantee.
A watermark W(t) asserts: “All events with event time ≤ t have been received.” When the watermark advances past the end of a window, the window can be closed and the result emitted.
Generating watermarks: The simplest approach is max(event_time_seen) - allowed_lateness. If the latest event seen has timestamp 12:00:50 and you allow 10 seconds of lateness, the watermark is 12:00:40. Any event with event time ≤ 12:00:40 arriving now is “late.”
The watermark trade-off:
- Conservative watermarks (large allowed lateness): High completeness (few events treated as late), but high latency (windows close late)
- Aggressive watermarks (small allowed lateness): Low latency, but some events are treated as late and either discarded or trigger window retractions
Late events and retractions: Flink and Beam allow windows to emit a result when the watermark advances, then emit a correction if a late event arrives. This requires downstream systems to handle retractions — updates to previously emitted results. Not all sinks support this, which is a practical constraint.
Why this matters philosophically: The existence of watermarks reveals that correctness in streaming is inherently probabilistic and time-bounded. You cannot achieve perfect completeness and low latency simultaneously. You are always making a bet about what “complete enough” means for your use case.
Exactly-Once Semantics: What It Really Means
“Exactly-once processing” is one of the most misunderstood concepts in distributed systems. The chapter provides philosophical clarity.
What Exactly-Once Is Not
Exactly-once does not mean that a message is processed by the stream processor code exactly once. In any system with retries and failures, a message may be re-delivered and re-processed. What “exactly-once” actually means is:
The observable effect of processing the message appears exactly once in the output.
This is a semantic guarantee, not a physical one. The message may be processed two or three times internally, but the result is as if it were processed once.
Mechanisms for Exactly-Once
Idempotent producers: Kafka’s idempotent producer assigns a sequence number to each message. If the same message is re-sent (due to a producer crash-and-retry), the broker recognizes the duplicate and discards it. The message appears once in the log.
Transactional writes: Kafka’s transaction API allows a stream processor to atomically commit both the offset advancement (I’ve processed up to this point) and the output records (here are the results) in a single transaction. If the processor crashes between reading input and writing output, the transaction is aborted and replayed. The consumer of the output topic only sees committed transactions.
Idempotent sinks: For output to external systems (databases, APIs), the stream processor cannot use Kafka transactions. Instead, it must ensure idempotency through deduplication keys. Every output record gets a unique ID; the sink deduplicates on that ID. This requires sink cooperation.
Two-phase commit (2PC): The most general solution — an atomic protocol across Kafka and the external sink. However, 2PC is expensive (two network round trips, locking during the prepare phase) and fragile (coordinator failures). Kafka avoids it by design.
The End-to-End Argument for Exactly-Once
Even if the stream processor provides exactly-once guarantees, the application can still produce incorrect results:
- A payment service with exactly-once processing can still double-charge if the “charge” operation is not itself idempotent
- A notification service can still send duplicate emails if email sending is outside the transactional boundary
This is the end-to-end argument applied to streaming: exactly-once in the processing layer is necessary but not sufficient for exactly-once application behavior. You must design idempotency at every boundary where the application interacts with external systems.
Idempotency as an alternative philosophy: Rather than fighting for exactly-once delivery (which requires distributed coordination), you can design operations to be idempotent — processing the same event multiple times produces the same result as processing it once. This is often simpler and more robust.
Example: Instead of “increment user’s credit by X_final if event X has not been applied.” The second is idempotent; the first is not.
State in Streaming Systems
State is what separates a simple per-event transformation from a meaningful stream computation. Counting events, joining streams, detecting fraud patterns — all require remembering previous events.
Types of State
Operator state: Bound to a specific operator instance (e.g., a Kafka consumer’s offset). Redistributed when the operator is scaled.
Keyed state: The most common form. The stream is partitioned by key (e.g., user_id), and each key maintains its own state independently. Scales horizontally: adding parallelism shards the key space.
Window state: State accumulated within a time window (tumbling, sliding, session). Cleared when the window closes.
Broadcast state: State distributed to all parallel instances of an operator (e.g., a lookup table replicated everywhere for fast joins).
State Backends and Fault Tolerance
Stream processor state must survive failures. The standard approach is checkpointing: periodically taking a consistent snapshot of all operator state and writing it to durable storage (HDFS, S3, RocksDB).
Flink’s distributed snapshot algorithm (Chandy-Lamport-inspired) takes consistent global snapshots without pausing processing:
- The checkpoint coordinator sends a “barrier” event to all source operators
- When an operator sees a barrier on all its inputs, it snapshots its state and forwards the barrier downstream
- The snapshot is complete when all sink operators have acknowledged it
- On failure: restore all operators to the last complete snapshot, replay events from that point
RocksDB as incremental state backend: For large state (hundreds of GB), Flink uses RocksDB (an LSM-tree-based embedded key-value store). Checkpoints are incremental — only changed SST files are uploaded to S3. Recovery is fast because RocksDB has the local state; only the delta since the last checkpoint is replayed.
The state evolution problem: What happens when you change the schema of your state? Adding a field is straightforward (default value for existing keys). Removing a field requires migrating state, which Flink supports via state serializers with schema evolution. This is a real operational burden in long-running streaming jobs.
Materialized Views: Streaming as Continuous Query
A materialized view is a query result that is pre-computed and stored, updated as new data arrives. Traditional databases compute materialized views with batch refresh (recompute everything) or incremental update (apply only the delta). Stream processing is, conceptually, the application of this incremental update pattern to an unbounded stream of data.
The key insight: A Flink job that maintains a running count of events per user is indistinguishable in purpose from a database that maintains a COUNT(*) GROUP BY user_id materialized view. The difference is scale and latency:
- Database materialized views: updated synchronously, limited by the database engine
- Stream-maintained views: updated with sub-second latency, horizontally scalable, decoupled from the serving database
CQRS (Command Query Responsibility Segregation) extends this: separate the write model (commands that mutate state) from the read model (queries that read materialized views). The stream processor is the bridge: it consumes commands from Kafka and maintains the read-optimized materialized view in a serving store (Redis, RocksDB, Elasticsearch).
Write Model Stream Processor Read Model
────────────────── ────────────────── ──────────────────────
User action (command) ─────▶ Process event ─────▶ Materialized view
│ Maintain state (Redis / RocksDB)
▼ Emit derived events │
Kafka (event log) Query (fast read)
(source of truth)
Materialized views and the streaming database vision: Systems like Materialize, Apache Flink SQL, and Confluent Platform implement “streaming SQL” — SQL queries that continuously update their results as new events arrive. This blurs the boundary between databases and stream processors and points toward a future where all data queries are fundamentally streaming queries with configurable freshness.
The Relationship Between Streams and Databases
One of the chapter’s deepest arguments is that streams and databases are dual representations of the same underlying reality.
A database is a stream in disguise: Every change to a database can be captured as an event (via CDC — Change Data Capture). The database’s current state is the fold of all those events from the beginning of time. Therefore:
- Database = current state (the snapshot)
- Stream = history of changes (the log)
- Database state = fold(stream, initial_state)
This is the event sourcing pattern: instead of storing current state, store the sequence of events that led to that state, and derive the current state on demand.
The stream table duality (Kafka Streams fundamental concept):
- A stream represents the changelog of a table over time
- A table represents the current materialized snapshot of a stream
These are interconvertible: you can create a table from a stream (by maintaining a KV store updated by the stream), or a stream from a table (by emitting CDC events on every update).
This duality has practical implications: in Kafka Streams, you can join a stream with a table (lookup), join two tables (cross-join their latest values), or join two streams (temporal join matching events within a time window).
Fault Tolerance Through Replayability
The philosophical foundation of streaming system fault tolerance is replayability: the ability to replay any segment of the event history to reconstruct state.
This is why Kafka’s log retention is so critical. A Kafka topic with indefinite retention (or archival to object storage) is essentially an immutable ledger of all events. No matter what goes wrong with downstream processors — bugs, crashes, misconfiguration — the authoritative record is intact. Recovery is “replay from last checkpoint.”
Contrast with traditional queuing: In a traditional message queue (RabbitMQ, SQS without replay), messages are deleted after consumption. If a consumer crashes between processing a message and committing its result, the message may be lost or processed incorrectly. There is no recovery path. Kafka’s log model eliminates this category of failure.
Immutability and audit trails: The immutable event log has a secondary benefit: perfect auditability. You can always answer “what did the system know, and when did it know it?” by replaying the log up to any point in time. This is invaluable for debugging (“why did this user get charged?”), compliance (audit trails for financial transactions), and data quality investigation.
The GDPR tension: The immutable log philosophy collides with GDPR’s right to erasure. The chapter references the cryptographic erasure solution: encrypt all personal data under a per-user key before writing to the log, then “erase” by deleting the key. The events remain in the log but are indecipherable. This requires designing personal data containment from the start — retrofitting it is difficult.
Correctness by Construction
The chapter’s culminating argument is that correctness should be a design property, not something tested for after the fact. The philosophical approach:
- Make state transitions explicit: Represent all state changes as events; never mutate state silently
- Make all derivations reproducible: Any derived view can be rebuilt from the source events
- Make side effects idempotent: Any external operation (API call, email, payment) is safe to repeat
- Make time explicit: Don’t rely on processing time for event semantics; embed event time in the event
- Make failures a first-class concern: Design for failure recovery from the beginning, not as an afterthought
This is related to the functional programming philosophy applied to data systems: pure functions (given the same input, always produce the same output) applied to events produce deterministic, reproducible results. Side effects are isolated to the output of the pipeline, where they can be controlled.
The contrast with imperative mutation: Traditional application code mutates shared state (UPDATE orders SET status = ‘shipped’). When a bug is found, you cannot “undo” the mutation; you can only apply another mutation to compensate. In an event-sourced system, you replay events with the corrected logic to rebuild state — bugs can be retroactively fixed by reprocessing.
Comparison Tables
Lambda vs. Kappa Architecture
| Dimension | Lambda | Kappa |
|---|---|---|
| Processing layers | Batch + Speed + Serving | Single stream processor |
| Code paths | Two (batch and streaming) | One |
| Historical reprocessing | Batch layer recomputes | Replay Kafka log |
| Operational complexity | High (two systems) | Low (one system) |
| Latency | High for batch results | Low throughout |
| Correctness | Harder (two impls to keep in sync) | Simpler (one impl) |
| Long-term retention | Hadoop/S3 (unlimited) | Kafka retention (cost constraint) |
| When to use | Legacy systems, pre-2018 tooling | Modern systems; default choice |
Exactly-Once Mechanisms
| Mechanism | Scope | Trade-off |
|---|---|---|
| Idempotent producer | Within Kafka log | Eliminates duplicate log entries from producer retries |
| Kafka transactions | Kafka to Kafka | Atomic read-process-write; overhead of distributed coord |
| Idempotent sink | Kafka to external store | Requires sink cooperation; dedup key overhead |
| At-least-once + idempotent ops | End-to-end | Simplest; requires all operations to be idempotent |
| 2PC | Cross-system atomicity | Expensive, fragile; avoid except where unavoidable |
Event Time vs. Processing Time
| Dimension | Event Time | Processing Time |
|---|---|---|
| What it measures | When the event occurred in the real world | When the processor received the event |
| Accuracy | High for business semantics | Low for business semantics |
| Difficulty | Hard (out-of-order, late events, watermarks) | Easy (wall clock) |
| Use case | Revenue reporting, user behavior analysis | Operational metrics, processing health |
| Late data handling | Watermarks, retractions, allowed lateness | N/A — data arrives in order |
Important Points Summary
- Streaming is the fundamental paradigm; batch is a special case: A batch job is a stream with a known finite end. Flink’s “everything is streaming” unifies the two, eliminating the need for separate Lambda-style batch and streaming code paths.
- The Kappa architecture supersedes Lambda: One stream processor, one code path, historical reprocessing via log replay. Lambda should be considered legacy as of 2026.
- The event log (Kafka) is the source of truth: Current state in any database or derived view is just a materialized snapshot of the event history. If a derived view is corrupted, rebuild from the log.
- Event time is almost always what you want for business semantics: Processing time is a convenient approximation; event time is the truth. Watermarks are the pragmatic mechanism for reasoning about completeness.
- Exactly-once means “observable effects appear exactly once”: The stream processor may internally re-process a message; what matters is that the output reflects exactly-once semantics. Idempotency is the practical implementation strategy.
- Correctness by construction: Design state as explicit events, derivations as reproducible transformations, side effects as idempotent operations. Correctness is a design property, not a test result.
- Materialized views are the bridge between streams and databases: Stream processors are continuous query engines maintaining pre-computed views at sub-second freshness.
- Fault tolerance = replayability: The immutable log is the recovery mechanism. No message queue that deletes consumed messages can provide the same recovery guarantees.
- The stream-table duality: A stream is the changelog of a table; a table is a materialized snapshot of a stream. They are two views of the same underlying data.
- GDPR vs. immutability: Resolve with cryptographic erasure — encrypt personal data per-user, erase the key to “forget.” Must be designed in from the start.
Modern Context (2026)
Apache Flink has become the reference implementation: Flink’s unified batch-streaming model, exactly-once semantics, and rich stateful API are the benchmark against which other stream processors are measured. Flink on Kubernetes (with Apache Paimon for state) is the standard enterprise deployment in 2026.
Streaming SQL is mainstream: Apache Flink SQL, Materialize, Confluent Platform, and RisingWave implement streaming SQL — SQL queries that continuously update as new data arrives. This dramatically lowers the barrier to stream processing: analysts who know SQL can now write streaming queries without learning a stream processing API.
The Iceberg/Delta Lake integration: The “Kappa with cold storage” pattern is now well-supported: Kafka for recent events, Apache Iceberg on S3 for historical archival, with Flink reading from both with the same SQL. This solves the retention cost problem that was Kappa’s main weakness.
Vector embeddings and streaming: ML inference (embedding generation, anomaly detection) is increasingly deployed as a stream processing operator. Flink jobs that compute embeddings for incoming documents and write to a vector database (Pinecone, Qdrant) are a standard 2026 pattern.
Real-time feature stores: Feast, Tecton, and Redis Enterprise provide streaming feature computation: raw events → Flink → feature store → ML model serving. The stream processor replaces the traditional batch feature computation job.
Apache Kafka’s Kafka Raft (KRaft) mode: The removal of ZooKeeper dependency in Kafka 3.x (using Raft consensus internally) simplifies operations and improves scalability. By 2026, ZooKeeper-based Kafka clusters are fully deprecated.
Questions for Reflection
- The chapter argues that “batch is a special case of streaming.” Does this inversion of the conventional view change how you would architect a new data system? What would you do differently?
- If exactly-once semantics only guarantees observable effects appear once, but does not prevent the processor from internally re-processing a message, what does this imply about the design of external systems (APIs, databases) that a stream processor interacts with?
- A team argues that Lambda architecture gives them more confidence because the batch layer provides a “ground truth” that the speed layer can be checked against. How would you counter this argument?
- Watermarks involve a trade-off between completeness (not missing late events) and latency (closing windows quickly). How would you determine the right watermark policy for a revenue reporting system? What data would you need?
- The event log as an immutable record of facts sounds philosophically appealing, but GDPR’s right to erasure requires deleting personal data. Is cryptographic erasure genuinely equivalent to deletion, or is it a legal fiction? What are the edge cases?
- If streaming systems can replace batch systems (Kappa), and streaming SQL can replace traditional databases for many read patterns, what role remains for traditional relational databases in a streaming-first architecture?
Related Resources
- ch12-stream-processing — The mechanical foundation: Kafka internals, watermarks, stream joins (read this chapter first)
- ch11-batch-processing — The Kappa architecture context: why batch still matters for cold storage and ML training
- ch10-consistency-and-consensus — Exactly-once semantics require consensus at the Kafka broker level (Raft/KRaft)
- ch08-transactions — Kafka transactions build on the same principles as database transactions
- ch06-replication — Kafka’s log replication model; fault tolerance via ISR
- ch12-future-of-data-systems — 1st edition equivalent: covers Kappa/Lambda and event logs (Ch12 in 1E)
Last Updated: 2026-05-29