Chapter 12: Transactional Sagas

saht sagas distributed-transactions eventual-consistency orchestration choreography

Status: Notes complete

Overview

Chapter 12 is the payoff chapter of Part II (“Putting It All Together”). Having established the problems of distributed data ownership in ch09-data-ownership-distributed-transactions and workflow coordination in ch11-managing-distributed-workflows, the authors now present a unified taxonomy for managing distributed transactions in microservices: the saga pattern and its eight distinct variants.

The word “saga” in distributed systems refers to a sequence of local transactions, one per participating service, where each step either completes successfully or triggers compensating actions to undo prior steps. Sagas replace the two-phase commit (2PC) of the monolith world with a more loosely coupled — but more complex — alternative that respects service autonomy.

The chapter’s central intellectual contribution is a three-dimensional classification system that generates exactly eight named saga patterns, each with a unique trade-off profile. Every dimension is a binary choice, so 2³ = 8 combinations arise naturally. The authors give each pattern a colorful genre name (Epic Saga, Horror Story, etc.) to make them memorable and distinguishable.

The chapter closes with the Sysops Squad case study applying the decision framework to a real scenario involving ticket assignment — a workflow that crosses multiple service boundaries and cannot tolerate data loss.

The Three Dimensions of Saga Classification

Every saga pattern is defined by three independent binary axes. Understanding the axes is prerequisite to understanding any individual pattern.

Axis 1: Communication Style — Synchronous (s) vs. Asynchronous (a)

Synchronous (s): The caller blocks and waits for a response before continuing. Uses HTTP/REST or gRPC request-response. The saga cannot proceed to the next step until the current step confirms success or failure.

Simpler error handling — the caller knows immediately whether the step succeeded
Higher coupling — the caller is tightly bound to the callee’s availability and latency
Lower throughput — threads/connections are held open during the call

Asynchronous (a): The caller fires a message and continues, receiving results later via events or callbacks. Uses message queues (Kafka, RabbitMQ, SQS) or event buses.

Decouples caller and callee in time — callee can be temporarily unavailable
Higher throughput and scalability
More complex error handling — failures are discovered out-of-band, often much later

Axis 2: Consistency Model — Atomic (a) vs. Eventual (e)

Atomic (a): The saga treats all its steps as a single logical unit. Either all succeed or a compensating rollback process is triggered to undo completed steps. The goal is ACID-like behavior across services.

Strongest data safety guarantee across distributed services
Most difficult to implement — requires careful state tracking and compensation logic
Higher latency — all steps must complete (or compensate) before the saga is “done”

Eventual (e): The saga accepts that intermediate states will exist and that the system will converge to a consistent state over time. No all-or-nothing guarantee. Failed steps may be retried, skipped, or flagged for human intervention.

Simpler implementation — no rollback machinery needed
Higher availability — partial completion is acceptable
Requires tolerant readers and tolerant writers downstream

Axis 3: Coordination Style — Orchestrated (o) vs. Choreographed (c)

Orchestrated (o): A central orchestrator service (or process) knows the full workflow. It sends commands to participants, receives responses, tracks state, and decides what to do next.

Clear visibility into saga progress — state is in one place
Easier to add steps, change order, or add compensation logic
Introduces a central coordination point — potential bottleneck or single point of failure
Couples participants to the orchestrator

Choreographed (c): No central coordinator. Services react to events published by other services. Each service knows only its own role and what events to emit on completion.

Maximum decoupling — services don’t know about each other
Higher resilience — no central point of failure
Workflow logic is distributed across services — very hard to observe end-to-end state
Emergent behavior: workflow is implicit, not explicit

The Naming Scheme

Each pattern is encoded as a three-letter code: [communication][consistency][coordination]

Code	Communication	Consistency	Coordination
s	Synchronous	—	—
a	Asynchronous	Atomic	—
e	—	Eventual	—
o	—	—	Orchestrated
c	—	—	Choreographed

Example: sao = Synchronous + Atomic + Orchestrated = Epic Saga

The 8 Saga Patterns

1. Epic Saga (sao): Synchronous + Atomic + Orchestrated

Code: sao
Character: The classic, tightest-coupled saga. The orchestrator drives every step synchronously and treats the whole workflow as one atomic unit.

How It Works

An orchestrator service calls each participant service synchronously, one at a time (or in defined parallel groups). It waits for each response before proceeding. If any step fails, the orchestrator immediately issues compensating calls to undo all prior completed steps — also synchronously.

Client
  |
  |---[1. Create Order]----------> Orchestrator
                                       |
                                       |---[2. Reserve Inventory]--> Inventory Svc
                                       |<---[200 OK]----------------
                                       |
                                       |---[3. Process Payment]----> Payment Svc
                                       |<---[200 OK]---------------
                                       |
                                       |---[4. Schedule Shipment]--> Shipping Svc
                                       |<---[200 OK]---------------
                                       |
  |<--[Saga Complete]---------------

On failure at Step 3 (Payment):

Orchestrator
  |---[COMPENSATE: Cancel Reservation]--> Inventory Svc
  |<---[200 OK]-------------------------
  |
  |---[Return failure to Client]---------

Trade-offs

Dimension	Assessment
Coupling	Very High — orchestrator knows all participants; all services must be available
Complexity	Medium — workflow is explicit and visible, but compensation adds code
Data Consistency	Strong (atomic) — all-or-nothing semantics
Fault Tolerance	Low — a single unavailable service blocks the entire saga
Scalability	Low — synchronous blocking limits throughput
State Management	Easy — orchestrator holds all state
Latency	High — sum of all step latencies

When to Use

Financial transactions where partial completion is unacceptable (e.g., transfer funds between accounts)
Low-volume, high-criticality workflows where latency can be traded for correctness
Teams that need maximum observability into workflow state
When all participating services are under the same reliability SLA

When to Avoid

High-throughput systems
Workflows spanning services with very different latency profiles
When participant services are external third-party APIs with unpredictable availability

2. Phone Tag Saga (sac): Synchronous + Atomic + Choreographed

Code: sac
Character: Services call each other in a chain, each one bearing responsibility for calling the next. Atomic consistency is maintained by propagating compensating calls back up the chain on failure.

How It Works

There is no orchestrator. Service A calls Service B synchronously, which calls Service C synchronously, which calls Service D. Each service waits for the response of the next before responding to its own caller. On failure, compensation propagates back up the call chain.

Client --> Service A --> Service B --> Service C --> Service D
                   <-- (resp)   <-- (resp)   <-- (resp)

On failure at D:
Service C compensates D (already failed), then returns failure to B
Service B compensates C, then returns failure to A
Service A compensates B, then returns failure to Client

Trade-offs

Dimension	Assessment
Coupling	Very High — each service is coupled to the next in the chain
Complexity	Very High — each service must know its compensation logic AND manage downstream calls
Data Consistency	Strong (atomic) — but requires every service to implement compensation correctly
Fault Tolerance	Very Low — any service failure breaks the entire chain
Scalability	Very Low — linear chain; cannot parallelize
State Management	Very Hard — state is distributed across the chain
Latency	Very High — sum of all round-trips through the chain

When to Use

Almost never preferred — this is primarily a pattern to recognize and avoid
Occasionally valid for very short chains (2 services) where a dedicated orchestrator would be overkill

Key Risk: Tight coupling propagates through the chain — a change to any intermediate service’s interface ripples to all services that call it. Debugging is extremely difficult because the workflow logic is buried inside each service.

3. Fairy Tale Saga (seo): Synchronous + Eventual + Orchestrated

Code: seo
Character: An orchestrator coordinates the workflow synchronously but accepts eventual consistency — steps complete independently and inconsistent intermediate states are tolerable.

How It Works

The orchestrator calls each participant synchronously but does not attempt to roll back prior steps on failure. Instead, failed steps are retried, flagged for later reconciliation, or simply accepted as a divergent state that will be resolved eventually. The orchestrator still has full visibility into the workflow.

Trade-offs

Dimension	Assessment
Coupling	High — orchestrator couples to all participants
Complexity	Medium — no compensation logic required
Data Consistency	Eventual — intermediate inconsistency is accepted
Fault Tolerance	Medium — individual step failures don’t cascade to full rollback
Scalability	Low-Medium — still synchronous
State Management	Easy — centralized in orchestrator
Latency	Medium-High — synchronous calls but no compensation round-trips

When to Use

Order processing systems where some steps (e.g., sending a confirmation email) can be retried later
Workflows where partial completion is acceptable and reconciliation processes exist downstream
When you need the observability of orchestration but can tolerate some data lag

4. Time Travel Saga (sec): Synchronous + Eventual + Choreographed

Code: sec
Character: Services call each other in a chain synchronously, but no rollback is attempted. The system will eventually reach consistency through retries and reconciliation.

How It Works

Like Phone Tag but without the atomicity requirement. Service A calls Service B, which calls Service C. On failure, the calling service logs the failure and relies on a retry or reconciliation mechanism rather than compensating prior steps.

Trade-offs

Dimension	Assessment
Coupling	High — chain coupling still exists
Complexity	Medium — no compensation, but chain management is still hard
Data Consistency	Eventual — inconsistent states will exist temporarily
Fault Tolerance	Medium — no cascading compensation, but chain can still stall
Scalability	Low — synchronous chain
State Management	Hard — state is distributed across services with no orchestrator
Latency	High — full chain must complete

When to Use

When workflows are naturally sequential and must be synchronous (e.g., integrating legacy systems that don’t support async messaging)
When eventual consistency is acceptable but an orchestrator is too heavyweight for the scenario

5. Fantasy Fiction Saga (aao): Asynchronous + Atomic + Orchestrated

Code: aao
Character: The orchestrator sends asynchronous messages to participants but still requires all-or-nothing atomicity. This is a challenging combination: async communication makes it hard to know when all steps have completed or failed, yet the system must maintain atomic guarantees.

How It Works

The orchestrator publishes commands to participant services via a message queue. Each participant processes its command and publishes a success/failure event. The orchestrator listens for these events and tracks state in a saga state machine. If any step fails, the orchestrator publishes compensating commands to already-completed steps.

The Key Challenge: Because communication is async, the orchestrator must wait for events that may arrive out of order, be delayed, or not arrive at all (due to network/broker issues). The saga state machine must handle timeouts and idempotency.

Trade-offs

Dimension	Assessment
Coupling	Medium — temporal decoupling, but orchestrator still knows all participants
Complexity	Very High — async + atomic is the hardest combination; state machine is complex
Data Consistency	Strong (atomic) — but requires sophisticated state tracking
Fault Tolerance	Medium-High — services can be temporarily unavailable (messages queue up)
Scalability	High — async messaging supports high throughput
State Management	Very Hard — async events can arrive out of order; timeout handling required
Latency	Low-Medium — async doesn’t block caller, but total saga time may still be long

When to Use

High-throughput financial systems needing both scale and strong consistency
When participant services have variable latency but atomicity cannot be compromised
Requires significant investment in saga state machine infrastructure

6. Horror Story (aac): Asynchronous + Atomic + Choreographed

Code: aac
Character: The most problematic pattern. Services communicate asynchronously, require atomic consistency, and there is no orchestrator — each service must independently manage its piece of the compensation logic based on events it observes.

How It Works

Services publish events when they complete steps. Other services listen for these events and proceed. If a service fails, it publishes a failure event. Other services that already completed their steps must listen for this failure event and independently execute their own compensation logic. No single service knows the overall workflow state.

Why It’s Called “Horror Story”

Compensating transactions must be implemented in every participant service, with each service responsible for knowing what it did and how to undo it
Since there’s no orchestrator, there’s no single place to observe overall saga state
Event ordering issues: services may receive events out of order and must handle this
Idempotency is critical: compensating events may be delivered multiple times
Debugging is a nightmare — workflow state is scattered across all services and message queues
Adding a new step to the workflow requires modifying multiple services

Trade-offs

Dimension	Assessment
Coupling	Low (temporal) — but behavioral coupling is very high
Complexity	Extremely High — highest of all 8 patterns
Data Consistency	Strong in theory — nearly impossible to guarantee in practice
Fault Tolerance	Very Low — any service failing to consume a compensating event breaks atomicity
Scalability	High — async messaging
State Management	Nightmarish — no single source of truth
Latency	Low-Medium — non-blocking

When to Use

The authors strongly advise avoiding this pattern unless there is no alternative
If you find yourself here, seriously consider adding an orchestrator (moving to aao)

7. Parallel Saga (aeo): Asynchronous + Eventual + Orchestrated

Code: aeo
Character: The orchestrator fans out async commands to multiple participants simultaneously, collects eventual results, and accepts that consistency will be reached over time. Excellent for parallelizable workflows.

How It Works

The orchestrator publishes commands to multiple services concurrently via async messaging. Each service processes its command independently and publishes a result event. The orchestrator listens for all result events (potentially with a scatter-gather pattern) and aggregates them. Because consistency is eventual, there’s no compensation logic — partial results are acceptable.

                         +--> [Inventory Svc] --> (event)
                         |                            \
Orchestrator --[msgs]--> +--> [Payment Svc]  --> (event)--> Orchestrator
                         |                            /      (aggregates)
                         +--> [Shipping Svc] --> (event)

Trade-offs

Dimension	Assessment
Coupling	Medium — orchestrator knows participants, but temporal decoupling exists
Complexity	Medium — no compensation needed, but scatter-gather requires care
Data Consistency	Eventual — intermediate states are normal
Fault Tolerance	High — temporary service unavailability is handled by message queuing
Scalability	Very High — parallel async processing
State Management	Medium — orchestrator aggregates results but doesn’t need to track compensation
Latency	Low — parallel execution reduces total elapsed time

When to Use

Workflows with independent parallel steps (e.g., sending notifications to multiple channels, processing multiple file types simultaneously)
High-throughput batch processing
When eventual consistency is acceptable and speed matters more than strict ordering
Recommendation engines, analytics pipelines, notification fanouts

8. Anthology Saga (aec): Asynchronous + Eventual + Choreographed

Code: aec
Character: Fully event-driven, maximum decoupling. No orchestrator, async messaging throughout, and no atomicity requirement. Services react to events and emit their own events. The most decoupled pattern.

How It Works

Services subscribe to events published on a shared event bus. When a triggering event arrives, a service processes it and publishes one or more result events. Other services listen for those events and react in turn. No service knows about any other service’s existence directly. The workflow is emergent from the event flow.

[Order Placed Event]
       |
       +--> Inventory Svc (reserves stock) --> [Stock Reserved Event]
       |                                               |
       +--> Fraud Svc (checks fraud)     --> [Fraud OK Event]
                                                       |
                                             Payment Svc (listens for both)
                                             --> [Payment Processed Event]
                                                       |
                                             Shipping Svc
                                             --> [Shipment Scheduled Event]

Trade-offs

Dimension	Assessment
Coupling	Very Low — services know only about events, not each other
Complexity	High — emergent workflows are hard to reason about end-to-end
Data Consistency	Eventual — the defining characteristic
Fault Tolerance	Very High — no single point of failure; messages persist in broker
Scalability	Very High — fully async, no coordination bottleneck
State Management	Hard — no single place tracks overall workflow state
Latency	Low — non-blocking, can be highly parallel

When to Use

Event-driven microservices architectures at scale
Workflows where services truly should be independent (different teams, different release cycles)
High-volume event processing (e-commerce order flows, IoT pipelines, analytics)
When eventual consistency is a first-class design requirement, not a compromise

Key Observability Challenge: Because no service knows the full workflow, end-to-end tracing requires distributed tracing infrastructure (Jaeger, Zipkin, OpenTelemetry) and correlation IDs on all events. Without this, debugging production issues is nearly impossible.

Saga Pattern Comparison Matrix

This table summarizes all eight patterns across the six key dimensions. Rating scale: VL = Very Low, L = Low, M = Medium, H = High, VH = Very High.

Pattern	Code	Coupling	Complexity	Data Consistency	Fault Tolerance	Scalability	State Mgmt Difficulty
Epic Saga	sao	VH	M	Strong/Atomic	VL	VL	Easy
Phone Tag Saga	sac	VH	VH	Strong/Atomic	VL	VL	Very Hard
Fairy Tale Saga	seo	H	M	Eventual	M	L-M	Easy
Time Travel Saga	sec	H	M	Eventual	M	L	Hard
Fantasy Fiction Saga	aao	M	VH	Strong/Atomic	M-H	H	Very Hard
Horror Story	aac	L*	Extreme	Weak in practice	VL	H	Nightmarish
Parallel Saga	aeo	M	M	Eventual	H	VH	Medium
Anthology Saga	aec	VL	H	Eventual	VH	VH	Hard

*aac has low temporal coupling but very high behavioral coupling — services must coordinate their compensation logic implicitly.

Practical Recommendations by Priority

Recommended for most teams:

aeo (Parallel Saga) — best balance of scalability, fault tolerance, and reasonable complexity when eventual consistency is acceptable
sao (Epic Saga) — best choice when strong consistency is required and throughput is modest
aec (Anthology Saga) — best for large-scale event-driven systems with mature DevOps

Use with care:

seo (Fairy Tale Saga) — good middle ground when sync is unavoidable but rollback isn’t needed
aao (Fantasy Fiction Saga) — justified only when both scale and atomicity are non-negotiable

Avoid if possible:

sac (Phone Tag Saga) — almost always better to add an orchestrator
sec (Time Travel Saga) — awkward middle ground with few advantages
aac (Horror Story) — the authors name it a horror story for good reason

State Management and Compensating Transactions

Why State Management Is Hard in Sagas

In a monolithic ACID transaction, the database manages all state transitions atomically. In a saga, each step executes in a separate service with a separate database. There is no global transaction manager. The saga itself must track which steps have completed, which failed, and what compensation is needed.

Saga State Machines

A saga state machine is a persistent record of a saga’s progress. It typically lives in the orchestrator service’s database (for orchestrated sagas) or as an event-sourced log (for choreographed sagas).

State machine contents:

Saga ID (unique identifier, used as correlation ID)
Current step
Status of each completed step (SUCCESS, FAILED, COMPENSATED)
Input data for each step (needed for compensation)
Timestamps (for timeout detection)
Overall saga status (IN_PROGRESS, COMPLETED, COMPENSATING, FAILED)

Example state transitions:

CREATED
  --> STEP_1_PENDING
  --> STEP_1_COMPLETE
  --> STEP_2_PENDING
  --> STEP_2_FAILED
  --> COMPENSATING (triggers Step 1 compensation)
  --> STEP_1_COMPENSATED
  --> FAILED

Compensating Transactions

A compensating transaction is a business-level operation that semantically reverses a completed saga step. It is NOT a database rollback — the original transaction has already committed. A compensating transaction creates a new transaction that undoes the business effect.

Examples:

Payment charged → issue a refund (not a rollback; a new credit transaction)
Inventory reserved → release the reservation
Order created → cancel the order (marking it CANCELLED, not deleting it)
Email sent → cannot be unsent (some operations are non-compensatable — see below)

Non-compensatable operations: Some saga steps have no meaningful compensation (e.g., sending an email, printing a label, triggering an external webhook). For these, the saga must either:

Move the non-compensatable step to the end of the saga (pivot transaction pattern)
Accept that it cannot be rolled back and design downstream processes to handle this
Use a two-phase approach: first “reserve” the action, then “confirm” it (e.g., draft email then send)

The Pivot Transaction

In any saga with non-compensatable steps, the pivot transaction is the last compensatable step. Steps before the pivot can be rolled back; steps after cannot. The saga design should place non-compensatable operations after the pivot and ensure they only execute once the pivot has committed successfully.

[Compensatable] [Compensatable] [PIVOT] [Non-compensatable] [Non-compensatable]
    Step 1           Step 2      Step 3       Step 4              Step 5
    <-- can compensate  -->         |    <-- cannot compensate -->
                                   |
                          Last safe rollback point

Idempotency: Why It Is Mandatory

Because distributed systems can fail in partial ways (message delivered but response lost, service crashed after writing but before confirming), saga steps and compensating transactions must be idempotent: executing them multiple times must produce the same result as executing them once.

Techniques for idempotency:

Idempotency keys: Each saga step message includes a unique ID. The receiving service records processed IDs and skips re-processing if it sees a duplicate.
Conditional updates: Update only if the current state matches the expected state (optimistic locking).
Event deduplication: The message broker or service layer tracks already-processed message IDs.

Without idempotency, retries (which are necessary for reliability) can cause double-charges, duplicate reservations, or double-compensation.

Handling Timeouts

In asynchronous sagas, a participant may never respond (network partition, crash). The orchestrator must detect this via timeout and decide whether to:

Retry the step
Treat it as a failure and begin compensation
Escalate to a human operator

Timeouts in sagas are business decisions, not just technical ones. “How long do we wait for payment confirmation before canceling the order?” is a product requirement, not an infrastructure parameter.

Decision Framework

Step 1: Determine the Consistency Requirement

Ask: “What is the cost of partial completion?”

Unacceptable (e.g., money movement, legal records, inventory in a high-demand flash sale): Use atomic (a) patterns → sao, sac, aao, aac
Acceptable (e.g., sending notifications, updating analytics, syncing secondary data): Use eventual (e) patterns → seo, sec, aeo, aec

Step 2: Determine the Communication Requirement

Ask: “Do we need an immediate response, or can processing happen in the background?”

Need immediate confirmation (user is waiting at a checkout page): Synchronous (s) → sao, sac, seo, sec
Background processing acceptable (async job submission): Asynchronous (a) → aao, aac, aeo, aec

Step 3: Determine the Coordination Requirement

Ask: “Do we need visibility into the overall workflow? Do we have a team structure that supports a central coordinator?”

Need visibility / simpler reasoning: Orchestrated (o) → sao, seo, aao, aeo
Maximum decoupling / independent service teams: Choreographed (c) → sac, sec, aac, aec

Step 4: Apply the Complexity Filter

Having arrived at a candidate pattern, check:

Does your team have the operational maturity to implement this pattern? (aao and aac require sophisticated infrastructure)
Is there a simpler pattern that covers your requirements? (Prefer the simplest pattern that meets all constraints)
Have you considered the Horror Story warning? (aac — if you’re here, add an orchestrator)

Decision Tree Summary

Is strict atomicity required?
├─ YES: Is synchronous communication required?
│   ├─ YES: Need orchestration? → Epic Saga (sao) : Phone Tag Saga (sac)
│   └─ NO: Need orchestration?  → Fantasy Fiction (aao) : Horror Story (aac) [avoid]
│
└─ NO: Is synchronous communication required?
    ├─ YES: Need orchestration? → Fairy Tale Saga (seo) : Time Travel Saga (sec)
    └─ NO: Need orchestration?  → Parallel Saga (aeo) : Anthology Saga (aec)

Sysops Squad Saga

Context

The Sysops Squad system manages IT support tickets. When a customer submits a ticket, a multi-step workflow must execute:

Create the ticket record
Assign a technician (check availability, skills match)
Notify the technician
Notify the customer
Update billing (for contract customers)

The Problem

Steps 1, 2, and 5 touch separate services with separate databases. Steps 3 and 4 are notifications (non-compensatable). Step 5 (billing) is transactional and must be consistent with ticket creation.

If technician assignment fails after the ticket is created, the system is in an inconsistent state. If billing fails after technician assignment, the technician has been notified of a job they may never get paid for.

Pattern Choice Analysis

The authors work through the decision framework:

Atomicity required? Yes — ticket creation and billing must be consistent (money is involved)
Synchronous required? No — the user submits a ticket and the system can process it in the background; the customer doesn’t need to wait for the technician to be found synchronously
Orchestration preferred? Yes — the team wants visibility into the workflow and the ticket assignment process has complex branching logic

Result: The Sysops Squad uses the Fantasy Fiction Saga (aao) pattern for the core ticket assignment flow, with the notification steps (Steps 3 and 4) placed after the pivot transaction.

Compensating Transaction Design

Step	Action	Compensating Action	Compensatable?
1	Create ticket	Cancel/delete ticket	Yes
2	Assign technician	Release technician	Yes
3	Notify technician	—	No (pivot)
4	Notify customer	—	No
5	Update billing	Reverse billing entry	Yes

Because Step 3 is non-compensatable, Steps 3-4 are placed after Step 5. The pivot is after Step 2 (last compensatable step before notifications go out). Step 5 (billing) is moved before notifications to remain compensatable.

Revised order: Create Ticket → Assign Technician → Update Billing → [PIVOT] → Notify Technician → Notify Customer

Key Takeaways

The three axes (sync/async, atomic/eventual, orchestrated/choreographed) create exactly eight saga patterns. Every distributed transaction pattern in practice maps to one of these eight combinations.
Atomic consistency is expensive in distributed systems. Requiring all-or-nothing semantics across services forces compensation logic, saga state machines, idempotency machinery, and timeout handling — all of which are complex to build correctly.
Orchestration trades coupling for observability. An orchestrator is a coupling point, but it gives you a single place to see workflow state, add steps, and add compensation logic. Choreography achieves decoupling at the cost of emergent, hard-to-observe workflows.
Horror Story (aac) should almost never be chosen. Asynchronous + Atomic + Choreographed is the worst combination: it has the complexity of both atomicity and choreography with none of the simplicity benefits.
Compensating transactions are business logic, not infrastructure. They must be designed by domain experts who understand the business semantics of “undoing” a step. Not every operation can be compensated.
Idempotency is non-negotiable in any saga implementation. Failures cause retries; retries cause duplicate messages; duplicate messages cause duplicate side effects unless every step is idempotent.
The pivot transaction marks the boundary between compensatable and non-compensatable steps. Saga design should explicitly identify this boundary and order steps accordingly.
Parallel Saga (aeo) is often the best all-around choice for high-throughput systems that can accept eventual consistency. It delivers scalability, fault tolerance, and reasonable complexity without the nightmare of choreographed atomicity.
State management difficulty correlates with choreography and atomicity. Orchestrated patterns localize state; eventual patterns eliminate compensation state. The hardest state management is in choreographed + atomic patterns (sac and aac).
Observability infrastructure is a prerequisite for choreographed patterns. Distributed tracing, correlation IDs on all messages, and centralized log aggregation are not optional extras — they are architectural requirements for any choreographed saga in production.

ch09-data-ownership-distributed-transactions — Establishes why cross-service transactions are hard; the problem that sagas solve
ch11-managing-distributed-workflows — Orchestration vs. choreography as a general workflow pattern (the o/c dimension of sagas)
ch10-distributed-data-access — How services access data owned by other services (relevant to saga step design)
wiki-two-phase-commit — The monolithic alternative to sagas; why it doesn’t work in microservices
wiki-outbox-pattern — A technique for reliably publishing events as part of a local database transaction; essential infrastructure for async sagas

Last Updated: 2026-05-30

Study Notes by Niladri & AI

Explorer

ch12-transactional-sagas

Chapter 12: Transactional Sagas

Overview

The Three Dimensions of Saga Classification

Axis 1: Communication Style — Synchronous (s) vs. Asynchronous (a)

Axis 2: Consistency Model — Atomic (a) vs. Eventual (e)

Axis 3: Coordination Style — Orchestrated (o) vs. Choreographed (c)

The Naming Scheme

The 8 Saga Patterns

1. Epic Saga (sao): Synchronous + Atomic + Orchestrated

2. Phone Tag Saga (sac): Synchronous + Atomic + Choreographed

3. Fairy Tale Saga (seo): Synchronous + Eventual + Orchestrated

4. Time Travel Saga (sec): Synchronous + Eventual + Choreographed

5. Fantasy Fiction Saga (aao): Asynchronous + Atomic + Orchestrated

6. Horror Story (aac): Asynchronous + Atomic + Choreographed

7. Parallel Saga (aeo): Asynchronous + Eventual + Orchestrated

8. Anthology Saga (aec): Asynchronous + Eventual + Choreographed

Saga Pattern Comparison Matrix

Practical Recommendations by Priority

State Management and Compensating Transactions

Why State Management Is Hard in Sagas

Saga State Machines

Compensating Transactions

The Pivot Transaction

Idempotency: Why It Is Mandatory

Handling Timeouts

Decision Framework

Step 1: Determine the Consistency Requirement

Step 2: Determine the Communication Requirement

Step 3: Determine the Coordination Requirement

Step 4: Apply the Complexity Filter

Decision Tree Summary

Sysops Squad Saga

Context

The Problem

Pattern Choice Analysis

Compensating Transaction Design

Key Takeaways

Related Concepts

Graph View

Table of Contents

Backlinks