Chapter 12: Transactional Sagas
saht sagas distributed-transactions eventual-consistency orchestration choreography
Status: Notes complete
Overview
Chapter 12 is the payoff chapter of Part II (“Putting It All Together”). Having established the problems of distributed data ownership in ch09-data-ownership-distributed-transactions and workflow coordination in ch11-managing-distributed-workflows, the authors now present a unified taxonomy for managing distributed transactions in microservices: the saga pattern and its eight distinct variants.
The word “saga” in distributed systems refers to a sequence of local transactions, one per participating service, where each step either completes successfully or triggers compensating actions to undo prior steps. Sagas replace the two-phase commit (2PC) of the monolith world with a more loosely coupled — but more complex — alternative that respects service autonomy.
The chapter’s central intellectual contribution is a three-dimensional classification system that generates exactly eight named saga patterns, each with a unique trade-off profile. Every dimension is a binary choice, so 2³ = 8 combinations arise naturally. The authors give each pattern a colorful genre name (Epic Saga, Horror Story, etc.) to make them memorable and distinguishable.
The chapter closes with the Sysops Squad case study applying the decision framework to a real scenario involving ticket assignment — a workflow that crosses multiple service boundaries and cannot tolerate data loss.
The Three Dimensions of Saga Classification
Every saga pattern is defined by three independent binary axes. Understanding the axes is prerequisite to understanding any individual pattern.
Axis 1: Communication Style — Synchronous (s) vs. Asynchronous (a)
Synchronous (s): The caller blocks and waits for a response before continuing. Uses HTTP/REST or gRPC request-response. The saga cannot proceed to the next step until the current step confirms success or failure.
- Simpler error handling — the caller knows immediately whether the step succeeded
- Higher coupling — the caller is tightly bound to the callee’s availability and latency
- Lower throughput — threads/connections are held open during the call
Asynchronous (a): The caller fires a message and continues, receiving results later via events or callbacks. Uses message queues (Kafka, RabbitMQ, SQS) or event buses.
- Decouples caller and callee in time — callee can be temporarily unavailable
- Higher throughput and scalability
- More complex error handling — failures are discovered out-of-band, often much later
Axis 2: Consistency Model — Atomic (a) vs. Eventual (e)
Atomic (a): The saga treats all its steps as a single logical unit. Either all succeed or a compensating rollback process is triggered to undo completed steps. The goal is ACID-like behavior across services.
- Strongest data safety guarantee across distributed services
- Most difficult to implement — requires careful state tracking and compensation logic
- Higher latency — all steps must complete (or compensate) before the saga is “done”
Eventual (e): The saga accepts that intermediate states will exist and that the system will converge to a consistent state over time. No all-or-nothing guarantee. Failed steps may be retried, skipped, or flagged for human intervention.
- Simpler implementation — no rollback machinery needed
- Higher availability — partial completion is acceptable
- Requires tolerant readers and tolerant writers downstream
Axis 3: Coordination Style — Orchestrated (o) vs. Choreographed (c)
Orchestrated (o): A central orchestrator service (or process) knows the full workflow. It sends commands to participants, receives responses, tracks state, and decides what to do next.
- Clear visibility into saga progress — state is in one place
- Easier to add steps, change order, or add compensation logic
- Introduces a central coordination point — potential bottleneck or single point of failure
- Couples participants to the orchestrator
Choreographed (c): No central coordinator. Services react to events published by other services. Each service knows only its own role and what events to emit on completion.
- Maximum decoupling — services don’t know about each other
- Higher resilience — no central point of failure
- Workflow logic is distributed across services — very hard to observe end-to-end state
- Emergent behavior: workflow is implicit, not explicit
The Naming Scheme
Each pattern is encoded as a three-letter code: [communication][consistency][coordination]
| Code | Communication | Consistency | Coordination |
|---|---|---|---|
| s | Synchronous | — | — |
| a | Asynchronous | Atomic | — |
| e | — | Eventual | — |
| o | — | — | Orchestrated |
| c | — | — | Choreographed |
Example: sao = Synchronous + Atomic + Orchestrated = Epic Saga
The 8 Saga Patterns
1. Epic Saga (sao): Synchronous + Atomic + Orchestrated
Code: sao
Character: The classic, tightest-coupled saga. The orchestrator drives every step synchronously and treats the whole workflow as one atomic unit.
How It Works
An orchestrator service calls each participant service synchronously, one at a time (or in defined parallel groups). It waits for each response before proceeding. If any step fails, the orchestrator immediately issues compensating calls to undo all prior completed steps — also synchronously.
Client
|
|---[1. Create Order]----------> Orchestrator
|
|---[2. Reserve Inventory]--> Inventory Svc
|<---[200 OK]----------------
|
|---[3. Process Payment]----> Payment Svc
|<---[200 OK]---------------
|
|---[4. Schedule Shipment]--> Shipping Svc
|<---[200 OK]---------------
|
|<--[Saga Complete]---------------
On failure at Step 3 (Payment):
Orchestrator
|---[COMPENSATE: Cancel Reservation]--> Inventory Svc
|<---[200 OK]-------------------------
|
|---[Return failure to Client]---------
Trade-offs
| Dimension | Assessment |
|---|---|
| Coupling | Very High — orchestrator knows all participants; all services must be available |
| Complexity | Medium — workflow is explicit and visible, but compensation adds code |
| Data Consistency | Strong (atomic) — all-or-nothing semantics |
| Fault Tolerance | Low — a single unavailable service blocks the entire saga |
| Scalability | Low — synchronous blocking limits throughput |
| State Management | Easy — orchestrator holds all state |
| Latency | High — sum of all step latencies |
When to Use
- Financial transactions where partial completion is unacceptable (e.g., transfer funds between accounts)
- Low-volume, high-criticality workflows where latency can be traded for correctness
- Teams that need maximum observability into workflow state
- When all participating services are under the same reliability SLA
When to Avoid
- High-throughput systems
- Workflows spanning services with very different latency profiles
- When participant services are external third-party APIs with unpredictable availability
2. Phone Tag Saga (sac): Synchronous + Atomic + Choreographed
Code: sac
Character: Services call each other in a chain, each one bearing responsibility for calling the next. Atomic consistency is maintained by propagating compensating calls back up the chain on failure.
How It Works
There is no orchestrator. Service A calls Service B synchronously, which calls Service C synchronously, which calls Service D. Each service waits for the response of the next before responding to its own caller. On failure, compensation propagates back up the call chain.
Client --> Service A --> Service B --> Service C --> Service D
<-- (resp) <-- (resp) <-- (resp)
On failure at D:
Service C compensates D (already failed), then returns failure to B
Service B compensates C, then returns failure to A
Service A compensates B, then returns failure to Client
Trade-offs
| Dimension | Assessment |
|---|---|
| Coupling | Very High — each service is coupled to the next in the chain |
| Complexity | Very High — each service must know its compensation logic AND manage downstream calls |
| Data Consistency | Strong (atomic) — but requires every service to implement compensation correctly |
| Fault Tolerance | Very Low — any service failure breaks the entire chain |
| Scalability | Very Low — linear chain; cannot parallelize |
| State Management | Very Hard — state is distributed across the chain |
| Latency | Very High — sum of all round-trips through the chain |
When to Use
- Almost never preferred — this is primarily a pattern to recognize and avoid
- Occasionally valid for very short chains (2 services) where a dedicated orchestrator would be overkill
Key Risk: Tight coupling propagates through the chain — a change to any intermediate service’s interface ripples to all services that call it. Debugging is extremely difficult because the workflow logic is buried inside each service.
3. Fairy Tale Saga (seo): Synchronous + Eventual + Orchestrated
Code: seo
Character: An orchestrator coordinates the workflow synchronously but accepts eventual consistency — steps complete independently and inconsistent intermediate states are tolerable.
How It Works
The orchestrator calls each participant synchronously but does not attempt to roll back prior steps on failure. Instead, failed steps are retried, flagged for later reconciliation, or simply accepted as a divergent state that will be resolved eventually. The orchestrator still has full visibility into the workflow.
Trade-offs
| Dimension | Assessment |
|---|---|
| Coupling | High — orchestrator couples to all participants |
| Complexity | Medium — no compensation logic required |
| Data Consistency | Eventual — intermediate inconsistency is accepted |
| Fault Tolerance | Medium — individual step failures don’t cascade to full rollback |
| Scalability | Low-Medium — still synchronous |
| State Management | Easy — centralized in orchestrator |
| Latency | Medium-High — synchronous calls but no compensation round-trips |
When to Use
- Order processing systems where some steps (e.g., sending a confirmation email) can be retried later
- Workflows where partial completion is acceptable and reconciliation processes exist downstream
- When you need the observability of orchestration but can tolerate some data lag
4. Time Travel Saga (sec): Synchronous + Eventual + Choreographed
Code: sec
Character: Services call each other in a chain synchronously, but no rollback is attempted. The system will eventually reach consistency through retries and reconciliation.
How It Works
Like Phone Tag but without the atomicity requirement. Service A calls Service B, which calls Service C. On failure, the calling service logs the failure and relies on a retry or reconciliation mechanism rather than compensating prior steps.
Trade-offs
| Dimension | Assessment |
|---|---|
| Coupling | High — chain coupling still exists |
| Complexity | Medium — no compensation, but chain management is still hard |
| Data Consistency | Eventual — inconsistent states will exist temporarily |
| Fault Tolerance | Medium — no cascading compensation, but chain can still stall |
| Scalability | Low — synchronous chain |
| State Management | Hard — state is distributed across services with no orchestrator |
| Latency | High — full chain must complete |
When to Use
- When workflows are naturally sequential and must be synchronous (e.g., integrating legacy systems that don’t support async messaging)
- When eventual consistency is acceptable but an orchestrator is too heavyweight for the scenario
5. Fantasy Fiction Saga (aao): Asynchronous + Atomic + Orchestrated
Code: aao
Character: The orchestrator sends asynchronous messages to participants but still requires all-or-nothing atomicity. This is a challenging combination: async communication makes it hard to know when all steps have completed or failed, yet the system must maintain atomic guarantees.
How It Works
The orchestrator publishes commands to participant services via a message queue. Each participant processes its command and publishes a success/failure event. The orchestrator listens for these events and tracks state in a saga state machine. If any step fails, the orchestrator publishes compensating commands to already-completed steps.
The Key Challenge: Because communication is async, the orchestrator must wait for events that may arrive out of order, be delayed, or not arrive at all (due to network/broker issues). The saga state machine must handle timeouts and idempotency.
Trade-offs
| Dimension | Assessment |
|---|---|
| Coupling | Medium — temporal decoupling, but orchestrator still knows all participants |
| Complexity | Very High — async + atomic is the hardest combination; state machine is complex |
| Data Consistency | Strong (atomic) — but requires sophisticated state tracking |
| Fault Tolerance | Medium-High — services can be temporarily unavailable (messages queue up) |
| Scalability | High — async messaging supports high throughput |
| State Management | Very Hard — async events can arrive out of order; timeout handling required |
| Latency | Low-Medium — async doesn’t block caller, but total saga time may still be long |
When to Use
- High-throughput financial systems needing both scale and strong consistency
- When participant services have variable latency but atomicity cannot be compromised
- Requires significant investment in saga state machine infrastructure
6. Horror Story (aac): Asynchronous + Atomic + Choreographed
Code: aac
Character: The most problematic pattern. Services communicate asynchronously, require atomic consistency, and there is no orchestrator — each service must independently manage its piece of the compensation logic based on events it observes.
How It Works
Services publish events when they complete steps. Other services listen for these events and proceed. If a service fails, it publishes a failure event. Other services that already completed their steps must listen for this failure event and independently execute their own compensation logic. No single service knows the overall workflow state.
Why It’s Called “Horror Story”
- Compensating transactions must be implemented in every participant service, with each service responsible for knowing what it did and how to undo it
- Since there’s no orchestrator, there’s no single place to observe overall saga state
- Event ordering issues: services may receive events out of order and must handle this
- Idempotency is critical: compensating events may be delivered multiple times
- Debugging is a nightmare — workflow state is scattered across all services and message queues
- Adding a new step to the workflow requires modifying multiple services
Trade-offs
| Dimension | Assessment |
|---|---|
| Coupling | Low (temporal) — but behavioral coupling is very high |
| Complexity | Extremely High — highest of all 8 patterns |
| Data Consistency | Strong in theory — nearly impossible to guarantee in practice |
| Fault Tolerance | Very Low — any service failing to consume a compensating event breaks atomicity |
| Scalability | High — async messaging |
| State Management | Nightmarish — no single source of truth |
| Latency | Low-Medium — non-blocking |
When to Use
- The authors strongly advise avoiding this pattern unless there is no alternative
- If you find yourself here, seriously consider adding an orchestrator (moving to aao)
7. Parallel Saga (aeo): Asynchronous + Eventual + Orchestrated
Code: aeo
Character: The orchestrator fans out async commands to multiple participants simultaneously, collects eventual results, and accepts that consistency will be reached over time. Excellent for parallelizable workflows.
How It Works
The orchestrator publishes commands to multiple services concurrently via async messaging. Each service processes its command independently and publishes a result event. The orchestrator listens for all result events (potentially with a scatter-gather pattern) and aggregates them. Because consistency is eventual, there’s no compensation logic — partial results are acceptable.
+--> [Inventory Svc] --> (event)
| \
Orchestrator --[msgs]--> +--> [Payment Svc] --> (event)--> Orchestrator
| / (aggregates)
+--> [Shipping Svc] --> (event)
Trade-offs
| Dimension | Assessment |
|---|---|
| Coupling | Medium — orchestrator knows participants, but temporal decoupling exists |
| Complexity | Medium — no compensation needed, but scatter-gather requires care |
| Data Consistency | Eventual — intermediate states are normal |
| Fault Tolerance | High — temporary service unavailability is handled by message queuing |
| Scalability | Very High — parallel async processing |
| State Management | Medium — orchestrator aggregates results but doesn’t need to track compensation |
| Latency | Low — parallel execution reduces total elapsed time |
When to Use
- Workflows with independent parallel steps (e.g., sending notifications to multiple channels, processing multiple file types simultaneously)
- High-throughput batch processing
- When eventual consistency is acceptable and speed matters more than strict ordering
- Recommendation engines, analytics pipelines, notification fanouts
8. Anthology Saga (aec): Asynchronous + Eventual + Choreographed
Code: aec
Character: Fully event-driven, maximum decoupling. No orchestrator, async messaging throughout, and no atomicity requirement. Services react to events and emit their own events. The most decoupled pattern.
How It Works
Services subscribe to events published on a shared event bus. When a triggering event arrives, a service processes it and publishes one or more result events. Other services listen for those events and react in turn. No service knows about any other service’s existence directly. The workflow is emergent from the event flow.
[Order Placed Event]
|
+--> Inventory Svc (reserves stock) --> [Stock Reserved Event]
| |
+--> Fraud Svc (checks fraud) --> [Fraud OK Event]
|
Payment Svc (listens for both)
--> [Payment Processed Event]
|
Shipping Svc
--> [Shipment Scheduled Event]
Trade-offs
| Dimension | Assessment |
|---|---|
| Coupling | Very Low — services know only about events, not each other |
| Complexity | High — emergent workflows are hard to reason about end-to-end |
| Data Consistency | Eventual — the defining characteristic |
| Fault Tolerance | Very High — no single point of failure; messages persist in broker |
| Scalability | Very High — fully async, no coordination bottleneck |
| State Management | Hard — no single place tracks overall workflow state |
| Latency | Low — non-blocking, can be highly parallel |
When to Use
- Event-driven microservices architectures at scale
- Workflows where services truly should be independent (different teams, different release cycles)
- High-volume event processing (e-commerce order flows, IoT pipelines, analytics)
- When eventual consistency is a first-class design requirement, not a compromise
Key Observability Challenge: Because no service knows the full workflow, end-to-end tracing requires distributed tracing infrastructure (Jaeger, Zipkin, OpenTelemetry) and correlation IDs on all events. Without this, debugging production issues is nearly impossible.
Saga Pattern Comparison Matrix
This table summarizes all eight patterns across the six key dimensions. Rating scale: VL = Very Low, L = Low, M = Medium, H = High, VH = Very High.
| Pattern | Code | Coupling | Complexity | Data Consistency | Fault Tolerance | Scalability | State Mgmt Difficulty |
|---|---|---|---|---|---|---|---|
| Epic Saga | sao | VH | M | Strong/Atomic | VL | VL | Easy |
| Phone Tag Saga | sac | VH | VH | Strong/Atomic | VL | VL | Very Hard |
| Fairy Tale Saga | seo | H | M | Eventual | M | L-M | Easy |
| Time Travel Saga | sec | H | M | Eventual | M | L | Hard |
| Fantasy Fiction Saga | aao | M | VH | Strong/Atomic | M-H | H | Very Hard |
| Horror Story | aac | L* | Extreme | Weak in practice | VL | H | Nightmarish |
| Parallel Saga | aeo | M | M | Eventual | H | VH | Medium |
| Anthology Saga | aec | VL | H | Eventual | VH | VH | Hard |
*aac has low temporal coupling but very high behavioral coupling — services must coordinate their compensation logic implicitly.
Practical Recommendations by Priority
Recommended for most teams:
- aeo (Parallel Saga) — best balance of scalability, fault tolerance, and reasonable complexity when eventual consistency is acceptable
- sao (Epic Saga) — best choice when strong consistency is required and throughput is modest
- aec (Anthology Saga) — best for large-scale event-driven systems with mature DevOps
Use with care:
- seo (Fairy Tale Saga) — good middle ground when sync is unavoidable but rollback isn’t needed
- aao (Fantasy Fiction Saga) — justified only when both scale and atomicity are non-negotiable
Avoid if possible:
- sac (Phone Tag Saga) — almost always better to add an orchestrator
- sec (Time Travel Saga) — awkward middle ground with few advantages
- aac (Horror Story) — the authors name it a horror story for good reason
State Management and Compensating Transactions
Why State Management Is Hard in Sagas
In a monolithic ACID transaction, the database manages all state transitions atomically. In a saga, each step executes in a separate service with a separate database. There is no global transaction manager. The saga itself must track which steps have completed, which failed, and what compensation is needed.
Saga State Machines
A saga state machine is a persistent record of a saga’s progress. It typically lives in the orchestrator service’s database (for orchestrated sagas) or as an event-sourced log (for choreographed sagas).
State machine contents:
- Saga ID (unique identifier, used as correlation ID)
- Current step
- Status of each completed step (SUCCESS, FAILED, COMPENSATED)
- Input data for each step (needed for compensation)
- Timestamps (for timeout detection)
- Overall saga status (IN_PROGRESS, COMPLETED, COMPENSATING, FAILED)
Example state transitions:
CREATED
--> STEP_1_PENDING
--> STEP_1_COMPLETE
--> STEP_2_PENDING
--> STEP_2_FAILED
--> COMPENSATING (triggers Step 1 compensation)
--> STEP_1_COMPENSATED
--> FAILED
Compensating Transactions
A compensating transaction is a business-level operation that semantically reverses a completed saga step. It is NOT a database rollback — the original transaction has already committed. A compensating transaction creates a new transaction that undoes the business effect.
Examples:
- Payment charged → issue a refund (not a rollback; a new credit transaction)
- Inventory reserved → release the reservation
- Order created → cancel the order (marking it CANCELLED, not deleting it)
- Email sent → cannot be unsent (some operations are non-compensatable — see below)
Non-compensatable operations: Some saga steps have no meaningful compensation (e.g., sending an email, printing a label, triggering an external webhook). For these, the saga must either:
- Move the non-compensatable step to the end of the saga (pivot transaction pattern)
- Accept that it cannot be rolled back and design downstream processes to handle this
- Use a two-phase approach: first “reserve” the action, then “confirm” it (e.g., draft email then send)
The Pivot Transaction
In any saga with non-compensatable steps, the pivot transaction is the last compensatable step. Steps before the pivot can be rolled back; steps after cannot. The saga design should place non-compensatable operations after the pivot and ensure they only execute once the pivot has committed successfully.
[Compensatable] [Compensatable] [PIVOT] [Non-compensatable] [Non-compensatable]
Step 1 Step 2 Step 3 Step 4 Step 5
<-- can compensate --> | <-- cannot compensate -->
|
Last safe rollback point
Idempotency: Why It Is Mandatory
Because distributed systems can fail in partial ways (message delivered but response lost, service crashed after writing but before confirming), saga steps and compensating transactions must be idempotent: executing them multiple times must produce the same result as executing them once.
Techniques for idempotency:
- Idempotency keys: Each saga step message includes a unique ID. The receiving service records processed IDs and skips re-processing if it sees a duplicate.
- Conditional updates: Update only if the current state matches the expected state (optimistic locking).
- Event deduplication: The message broker or service layer tracks already-processed message IDs.
Without idempotency, retries (which are necessary for reliability) can cause double-charges, duplicate reservations, or double-compensation.
Handling Timeouts
In asynchronous sagas, a participant may never respond (network partition, crash). The orchestrator must detect this via timeout and decide whether to:
- Retry the step
- Treat it as a failure and begin compensation
- Escalate to a human operator
Timeouts in sagas are business decisions, not just technical ones. “How long do we wait for payment confirmation before canceling the order?” is a product requirement, not an infrastructure parameter.
Decision Framework
Step 1: Determine the Consistency Requirement
Ask: “What is the cost of partial completion?”
- Unacceptable (e.g., money movement, legal records, inventory in a high-demand flash sale): Use atomic (a) patterns → sao, sac, aao, aac
- Acceptable (e.g., sending notifications, updating analytics, syncing secondary data): Use eventual (e) patterns → seo, sec, aeo, aec
Step 2: Determine the Communication Requirement
Ask: “Do we need an immediate response, or can processing happen in the background?”
- Need immediate confirmation (user is waiting at a checkout page): Synchronous (s) → sao, sac, seo, sec
- Background processing acceptable (async job submission): Asynchronous (a) → aao, aac, aeo, aec
Step 3: Determine the Coordination Requirement
Ask: “Do we need visibility into the overall workflow? Do we have a team structure that supports a central coordinator?”
- Need visibility / simpler reasoning: Orchestrated (o) → sao, seo, aao, aeo
- Maximum decoupling / independent service teams: Choreographed (c) → sac, sec, aac, aec
Step 4: Apply the Complexity Filter
Having arrived at a candidate pattern, check:
- Does your team have the operational maturity to implement this pattern? (aao and aac require sophisticated infrastructure)
- Is there a simpler pattern that covers your requirements? (Prefer the simplest pattern that meets all constraints)
- Have you considered the Horror Story warning? (aac — if you’re here, add an orchestrator)
Decision Tree Summary
Is strict atomicity required?
├─ YES: Is synchronous communication required?
│ ├─ YES: Need orchestration? → Epic Saga (sao) : Phone Tag Saga (sac)
│ └─ NO: Need orchestration? → Fantasy Fiction (aao) : Horror Story (aac) [avoid]
│
└─ NO: Is synchronous communication required?
├─ YES: Need orchestration? → Fairy Tale Saga (seo) : Time Travel Saga (sec)
└─ NO: Need orchestration? → Parallel Saga (aeo) : Anthology Saga (aec)
Sysops Squad Saga
Context
The Sysops Squad system manages IT support tickets. When a customer submits a ticket, a multi-step workflow must execute:
- Create the ticket record
- Assign a technician (check availability, skills match)
- Notify the technician
- Notify the customer
- Update billing (for contract customers)
The Problem
Steps 1, 2, and 5 touch separate services with separate databases. Steps 3 and 4 are notifications (non-compensatable). Step 5 (billing) is transactional and must be consistent with ticket creation.
If technician assignment fails after the ticket is created, the system is in an inconsistent state. If billing fails after technician assignment, the technician has been notified of a job they may never get paid for.
Pattern Choice Analysis
The authors work through the decision framework:
- Atomicity required? Yes — ticket creation and billing must be consistent (money is involved)
- Synchronous required? No — the user submits a ticket and the system can process it in the background; the customer doesn’t need to wait for the technician to be found synchronously
- Orchestration preferred? Yes — the team wants visibility into the workflow and the ticket assignment process has complex branching logic
Result: The Sysops Squad uses the Fantasy Fiction Saga (aao) pattern for the core ticket assignment flow, with the notification steps (Steps 3 and 4) placed after the pivot transaction.
Compensating Transaction Design
| Step | Action | Compensating Action | Compensatable? |
|---|---|---|---|
| 1 | Create ticket | Cancel/delete ticket | Yes |
| 2 | Assign technician | Release technician | Yes |
| 3 | Notify technician | — | No (pivot) |
| 4 | Notify customer | — | No |
| 5 | Update billing | Reverse billing entry | Yes |
Because Step 3 is non-compensatable, Steps 3-4 are placed after Step 5. The pivot is after Step 2 (last compensatable step before notifications go out). Step 5 (billing) is moved before notifications to remain compensatable.
Revised order: Create Ticket → Assign Technician → Update Billing → [PIVOT] → Notify Technician → Notify Customer
Key Takeaways
-
The three axes (sync/async, atomic/eventual, orchestrated/choreographed) create exactly eight saga patterns. Every distributed transaction pattern in practice maps to one of these eight combinations.
-
Atomic consistency is expensive in distributed systems. Requiring all-or-nothing semantics across services forces compensation logic, saga state machines, idempotency machinery, and timeout handling — all of which are complex to build correctly.
-
Orchestration trades coupling for observability. An orchestrator is a coupling point, but it gives you a single place to see workflow state, add steps, and add compensation logic. Choreography achieves decoupling at the cost of emergent, hard-to-observe workflows.
-
Horror Story (aac) should almost never be chosen. Asynchronous + Atomic + Choreographed is the worst combination: it has the complexity of both atomicity and choreography with none of the simplicity benefits.
-
Compensating transactions are business logic, not infrastructure. They must be designed by domain experts who understand the business semantics of “undoing” a step. Not every operation can be compensated.
-
Idempotency is non-negotiable in any saga implementation. Failures cause retries; retries cause duplicate messages; duplicate messages cause duplicate side effects unless every step is idempotent.
-
The pivot transaction marks the boundary between compensatable and non-compensatable steps. Saga design should explicitly identify this boundary and order steps accordingly.
-
Parallel Saga (aeo) is often the best all-around choice for high-throughput systems that can accept eventual consistency. It delivers scalability, fault tolerance, and reasonable complexity without the nightmare of choreographed atomicity.
-
State management difficulty correlates with choreography and atomicity. Orchestrated patterns localize state; eventual patterns eliminate compensation state. The hardest state management is in choreographed + atomic patterns (sac and aac).
-
Observability infrastructure is a prerequisite for choreographed patterns. Distributed tracing, correlation IDs on all messages, and centralized log aggregation are not optional extras — they are architectural requirements for any choreographed saga in production.
Related Concepts
- ch09-data-ownership-distributed-transactions — Establishes why cross-service transactions are hard; the problem that sagas solve
- ch11-managing-distributed-workflows — Orchestration vs. choreography as a general workflow pattern (the
o/cdimension of sagas) - ch10-distributed-data-access — How services access data owned by other services (relevant to saga step design)
- wiki-two-phase-commit — The monolithic alternative to sagas; why it doesn’t work in microservices
- wiki-outbox-pattern — A technique for reliably publishing events as part of a local database transaction; essential infrastructure for async sagas
Last Updated: 2026-05-30