Chapter 15: Event-Driven Architecture

fsa architecture-styles event-driven async messaging

Status: Notes complete

Overview

Event-Driven Architecture (EDA) is an asynchronous distributed architecture style that uses events to communicate between decoupled services. It is one of the highest-performing and most scalable architecture styles available, but also one of the most complex to design, test, and debug.

EDA is distinguished by loose coupling: producers emit events without knowledge of consumers, and consumers react to events without knowledge of producers. This decoupling enables extreme extensibility and elasticity.

Topology

EDA has two primary topologies — Broker and Mediator — which differ in how events are routed and whether there is a central coordinator.

Broker Topology

                       +------------------+
  [Producer A] ------> |                  | ------> [Consumer 1]
                       |   Message Broker | ------> [Consumer 2]
  [Producer B] ------> |  (Topics/Queues) | ------> [Consumer 3]
                       +------------------+
         (No central coordinator — events flow freely)

No central orchestrator. Events are published to a broker (Kafka, RabbitMQ, etc.) and consumers subscribe to topics of interest.
Consumers can themselves emit new events (chained/derived events), creating reactive chains.
High decoupling, high scalability, difficult to trace end-to-end workflows.
Best for: simple event processing, highly extensible systems, broadcasting.

Mediator Topology

  [Initiating Event]
         |
         v
  +------------------+
  |  Event Mediator  |  <-- knows the workflow
  +------------------+
     |       |      |
     v       v      v
  [Step1] [Step2] [Step3]   (event channels per processing step)
     |       |      |
     v       v      v
  [Handler] [Handler] [Handler]

A central Event Mediator receives initiating events and orchestrates subsequent processing steps.
The mediator knows the workflow; processors do not know about each other.
Supports complex workflows with conditional logic, parallel processing, and error handling at the mediator level.
Best for: complex multi-step workflows, when coordination and visibility matter.
Implementations: Apache Camel, Spring Integration, MuleSoft, AWS Step Functions.

Mediated EDA (Hybrid)

In practice, architectures often combine both topologies: a mediator handles complex workflows internally while a broker handles broader event distribution between domains. The mediator processes its workflow and emits an event onto the broker when done.

Style Specifics

Events vs Messages

A critical conceptual distinction in EDA:

Concept	Nature	Semantics	Example
Event	Notification	Past-tense fact — something happened	`OrderPlaced`, `UserRegistered`
Message	Command	Imperative — do something	`ProcessPayment`, `SendEmail`

Events represent immutable facts about the world. They do not carry intent. A consumer decides what to do in response. Messages carry intent and are typically directed at a specific receiver.

Good EDA design uses true events (past-tense, factual notifications), not disguised commands.

Event Payload: Fat vs Thin Events

Fat Events carry all relevant data in the payload.

Advantages: consumers are self-sufficient, fewer round trips, lower latency.
Disadvantages: larger message size, data duplication, payload versioning complexity.

Thin Events carry only an identifier (e.g., orderId).

Advantages: smaller payload, single source of truth remains in the origin service.
Disadvantages: consumers must make synchronous callbacks to retrieve data, increasing coupling and latency.

In practice, a moderate fat event approach is preferred: include enough context to avoid immediate callbacks, but avoid bloating the payload with rarely-needed data.

Derived Events

A derived event is a new event emitted by a consumer as a result of processing an upstream event. This enables event chains — reactive pipelines where each processing step produces a new event for the next step.

OrderPlaced --> [Inventory Service] --> InventoryReserved
                                               |
                                               v
                                        [Shipping Service] --> ShipmentCreated

Derived events are a core mechanism for extensibility in the broker topology.

Triggering Events and Extensibility

Because producers emit events without knowledge of consumers, new consumers can be added at any time by subscribing to existing event topics. This makes EDA extremely extensible — adding behavior requires no changes to existing components.

This is the Open/Closed Principle applied at the architecture level: open for extension (add new consumers), closed for modification (existing producers/consumers unchanged).

Asynchronous Capabilities

EDA is inherently asynchronous. Producers fire and forget; they do not wait for consumers to process the event. This enables:

High throughput (producers are never blocked by downstream processing)
Temporal decoupling (producers and consumers do not need to be available simultaneously)
Natural buffering and backpressure management via the broker

Broadcast and Pub/Sub Capabilities

A single event can be consumed by multiple independent consumers simultaneously (fan-out). This is the publish-subscribe (pub/sub) pattern. The broker delivers a copy of the event to every subscriber.

Use cases: cache invalidation, audit logging, notification services, analytics — all consuming the same OrderPlaced event independently.

Error Handling

Error handling is one of the most challenging aspects of EDA due to asynchronous execution.

Dead-Letter Queues (DLQ)

When a consumer fails to process a message after a configured number of retries, the message is moved to a Dead-Letter Queue. This prevents poison messages (messages that always fail) from blocking the primary queue.

Operations teams monitor DLQs to detect systemic failures, investigate root causes, and replay messages after fixes.

Primary Queue --> [Consumer] --FAIL--> Retry Queue --> [Consumer] --FAIL--> Dead Letter Queue

Retry Queues and Exponential Backoff

Failed messages are typically retried with exponential backoff to avoid overwhelming a struggling downstream service. Retry queues hold messages temporarily between attempts.

Idempotency

Because EDA uses at-least-once delivery semantics (the default for most brokers), consumers may receive the same event more than once (due to retries). Consumer logic must be idempotent — processing the same event multiple times must produce the same result as processing it once.

Common idempotency techniques:

Store processed event IDs in a deduplication table
Use natural idempotent operations (e.g., upsert instead of insert)
Version-based conditional updates

At-Least-Once vs Exactly-Once Delivery

Semantic	Guarantee	Risk	Cost
At-most-once	May lose events	Data loss	Low
At-least-once	No data loss	Duplicate processing	Medium
Exactly-once	No loss, no duplicates	High complexity, performance cost	High

Exactly-once delivery requires transactional coordination between the broker and consumer — typically via idempotent producers and transactional consumers (e.g., Kafka transactions). Use only when business requirements demand it.

Preventing Data Loss

Data loss is a critical concern in async systems. Key patterns:

Persistent Queues

The broker must persist messages to durable storage (disk) before acknowledging receipt. If the broker crashes before delivery, messages are not lost.

Acknowledgment Patterns

Consumers must explicitly acknowledge a message only after successful processing. Until acknowledged, the broker retains the message and can redeliver it.

Broker --> Consumer (processes event) --> ACK --> Broker removes message
Broker --> Consumer (crashes mid-process) --> No ACK --> Broker redelivers

Outbox Pattern

For producers: instead of publishing directly to the broker, write the event to a local outbox table in the same database transaction as the business operation. A separate relay process reads the outbox and publishes to the broker. Ensures atomicity — the event is either committed with the business data or not at all.

Request-Reply Over Async

Sometimes a synchronous request-reply is needed in an otherwise async system (e.g., a REST caller needs a response). This is achieved with:

Correlation IDs

Each request message carries a unique correlation ID. The reply message includes the same ID. The original requestor matches replies to outstanding requests using the ID.

Reply-To Queues

The request message includes a reply-to queue or topic where the response should be sent. Each requestor creates a temporary or exclusive queue for receiving its replies.

[Caller] --> Request (correlationId=123, replyTo=caller-reply-queue) --> Broker
[Service] processes and sends --> Response (correlationId=123) --> caller-reply-queue
[Caller] reads from caller-reply-queue, matches correlationId=123

Timeout Handling

Request-reply over async requires timeout logic on the caller side. If no reply arrives within the timeout window, the caller must handle the failure (return an error, retry, use a fallback). Without timeouts, callers block indefinitely.

Swarm of Gnats Antipattern

An antipattern unique to EDA where events are designed at too fine a granularity, resulting in an excessive number of tiny, chatty events that overwhelm the system.

Symptoms:

Hundreds of events per business transaction, each carrying trivial data
High broker throughput consumed by low-value chatter
Consumer logic so fine-grained it’s fragmented across many tiny handlers
Debugging and tracing become nearly impossible

Solution: Design events at a meaningful business granularity (e.g., OrderFulfilled rather than ItemPickedFromShelf, ItemWrapped, ItemLabeled). Events should represent significant business facts, not implementation steps.

Data Topologies

How EDA handles data persistence varies:

Topology	Description	Trade-offs
Monolithic	All handlers share a single database	Simple consistency, tight coupling, scalability bottleneck
Domain	Each domain (group of handlers) has its own database	Loose coupling, eventual consistency across domains
Dedicated	Each event handler has its own dedicated database	Maximum decoupling, maximum complexity, data duplication

The dedicated topology aligns with microservices principles and is preferred for large-scale EDA systems, accepting eventual consistency as a design constraint.

Cloud Considerations

Provider	Broker Services	Managed Mediator / Orchestration
AWS	SNS, SQS, EventBridge, Kinesis	Step Functions, EventBridge Pipes
Azure	Service Bus, Event Hubs, Event Grid	Logic Apps, Durable Functions
GCP	Pub/Sub, Eventarc	Workflows, Cloud Tasks

Cloud-native EDA leverages managed brokers that handle durability, partitioning, scaling, and DLQ management automatically.

Common Risks

Complex Error Handling: Unlike synchronous architectures, failures are silent — the caller does not know a consumer failed. Requires robust monitoring, alerting, and DLQ management.

Workflow State Visibility: In broker topology, there is no single place to see the state of a multi-step workflow. Distributed tracing (Jaeger, Zipkin, AWS X-Ray) is essential.

Message Ordering Guarantees: Most brokers guarantee ordering within a partition/topic but not globally. When order matters, careful partitioning strategy is required.

Data Consistency: Eventual consistency is the default. Systems requiring strong consistency are poor fits for EDA.

Testing Complexity: Asynchronous, event-driven flows are difficult to test end-to-end. Race conditions, timing dependencies, and out-of-order delivery must all be tested explicitly.

Event Schema Evolution: Changing event schemas can break consumers. Requires a schema registry (Confluent Schema Registry, AWS Glue Schema Registry), backward/forward compatibility policies, and versioning strategies.

Duplicate Event Processing: Without idempotency, at-least-once delivery causes incorrect state. Every consumer must be designed for idempotency from the start.

Governance

Maintain a central event catalog documenting all event types, schemas, producers, and consumers.
Enforce schema compatibility via a schema registry — reject incompatible schema changes automatically.
Define and enforce naming conventions for events (past tense, domain-prefixed: order.placed, payment.processed).
Establish ownership: each event type has a clear owning team responsible for schema evolution.
Use correlation IDs and distributed tracing as mandatory standards across all services.
Define DLQ monitoring and alerting SLAs — dead-lettered messages must be investigated within defined SLOs.

Team Topology

EDA works well with stream-aligned teams (per Conway’s Law). Each team owns a bounded domain, produces events from that domain, and consumes events from other domains. Teams are loosely coupled through event contracts, mirroring the loose coupling of the architecture itself.

Platform teams provide shared broker infrastructure, schema registry, and observability tooling.

Architectural Characteristics Ratings

Characteristic	Rating	Notes
Agility	★★★★☆	Easy to add new consumers/producers; schema changes require care
Deployability	★★★☆☆	Independent deployment per service; broker version management adds complexity
Testability	★★☆☆☆	Async flows are hard to test; timing and ordering issues; requires contract testing
Performance	★★★★★	Non-blocking async processing; extremely high throughput possible
Scalability	★★★★★	Near-unlimited horizontal scaling of producers and consumers independently
Development Ease	★★☆☆☆	High complexity; async mental model, error handling, idempotency
Simplicity	★☆☆☆☆	One of the most complex architecture styles to design and operate correctly
Cost	★★★☆☆	Broker infrastructure cost moderate; operational savings from async scale-out

When to Use

Systems requiring very high throughput or scalability (millions of events/sec)
Loosely coupled, independently deployable services across multiple teams
Systems requiring high extensibility — new consumers must be addable without modifying producers
Broadcast scenarios where multiple systems react to the same business event
Event sourcing and CQRS patterns
Real-time analytics pipelines, fraud detection, IoT data ingestion
Systems with highly variable load (async buffering absorbs spikes)

When Not to Use

Workflows requiring immediate, synchronous responses (user-facing APIs with strict SLAs)
Simple CRUD applications with no need for event propagation
Small teams where operational complexity of a broker outweighs benefits
Systems requiring strong, immediate consistency across all components
Applications where end-to-end traceability of a workflow is a hard requirement and distributed tracing cannot be adopted

Examples and Use Cases

E-commerce order fulfillment: OrderPlaced triggers inventory reservation, payment processing, and shipping preparation in parallel.
Financial transaction processing: Each transaction event triggers fraud detection, ledger update, and notification services independently.
IoT sensor networks: Thousands of devices stream events; consumers process telemetry, detect anomalies, update dashboards.
Ride-sharing platform: Driver location events broadcast to dispatch, ETA calculation, and map display services simultaneously.
Social media feed: User action events (PostLiked, CommentAdded) fan out to notification, analytics, and recommendation services.

Key Takeaways

Two Topologies: Broker topology uses a message broker with no coordinator for maximum decoupling; Mediator topology uses a central orchestrator for complex, stateful workflows.
Events Are Facts: Events are past-tense immutable notifications; messages are commands. True EDA uses events to avoid hidden coupling.
Idempotency Is Non-Negotiable: At-least-once delivery means consumers will receive duplicates; every handler must be designed idempotent from day one.
DLQs Are Your Safety Net: Dead-letter queues prevent poison messages from blocking processing and provide a recovery mechanism for transient failures.
Fat vs Thin Events: Fat events reduce round trips but increase payload size; choose the payload strategy based on consumer needs and data sensitivity.
Swarm of Gnats: Events too fine-grained create excessive chatter — design events at meaningful business granularity.
Correlation IDs Enable Request-Reply: When sync response is needed in an async system, correlation IDs and reply-to queues bridge the gap with timeout handling.
Schema Governance Is Critical: Event schemas are contracts; a schema registry and compatibility enforcement prevent consumer breakage during evolution.
Highest Scalability and Performance: EDA achieves the highest ratings for performance and scalability across all architecture styles due to non-blocking async processing.
Lowest Simplicity: The price of extreme scalability is extreme complexity — error handling, ordering, consistency, and testing are all significantly harder than in synchronous architectures.

Chapter 14: Microservices Architecture (complementary style often combined with EDA)
Chapter 17: Choosing the Right Architecture Style
Richards & Ford, “Software Architecture Patterns” (O’Reilly)
Sam Newman, “Building Microservices” — async messaging chapters
Apache Kafka documentation — partitioning and delivery semantics
Confluent Schema Registry documentation

Last Updated: 2026-05-29

Study Notes by Niladri & AI

Explorer

ch15-event-driven-architecture