Chapter 9: Data Ownership and Distributed Transactions

saht data-ownership distributed-transactions eventual-consistency joint-ownership 2pc

Status: Notes complete


Overview

In a monolith, transactions are straightforward: a single ACID database handles everything. Once you decompose into microservices — each owning its own database — two hard problems emerge:

  1. Who owns what data? When a table’s data is written by multiple services, you have a conflict of ownership.
  2. How do you keep data consistent across service boundaries? Without a shared database, full ACID guarantees disappear.

Chapter 9 systematically works through both problems. It catalogs three data ownership scenarios in order of difficulty, four techniques for resolving the hardest case (joint ownership), and three eventual consistency patterns for managing distributed transactions. The chapter is foundational for ch11-managing-distributed-workflows and ch12-transactional-sagas, which build richer patterns on top of these foundations.

The Sysops Squad Saga application threads throughout: as the monolith is decomposed, the ticket processing subsystem reveals all three ownership scenarios and forces the team to choose among the resolution techniques.


Data Ownership Scenarios

The book establishes a clear vocabulary: data ownership means the service responsible for writing to (and therefore controlling) a particular table or data entity. Read access is a separate concern — a service that reads data it doesn’t own is fine; a service that writes data it doesn’t own is a problem.

Scenario 1 — Single Ownership

Definition: Exactly one service writes to a table. Other services may read it, but they do not write to it.

Why it’s easy: No ownership conflict exists. The owning service is the single source of truth. Other services query it via API or read from a replicated copy.

Example (Sysops Squad): The Survey service is the only service that writes to the survey table. The Reporting service reads survey results but only through a read API — it never writes. Ownership is unambiguous.

┌─────────────────────┐
│   Survey Service    │──writes──▶ [survey table]
└─────────────────────┘
         │
         │ read API
         ▼
┌─────────────────────┐
│  Reporting Service  │ (reads only, no ownership)
└─────────────────────┘

Resolution: None required. Assign the table to the service that writes it.


Scenario 2 — Common Ownership

Definition: Many services need to read from a shared, relatively static reference table (e.g., zip codes, country codes, product categories, lookup lists). No service writes to it frequently; when updates happen, they are administrative or infrequent.

Why it’s manageable: The data is not domain-specific — it belongs to no single business domain. It acts as infrastructure or reference data. Multiple services reading the same reference data is not a conflict; it’s sharing.

Resolution options:

  1. Shared reference data service — Create a dedicated service (e.g., ReferenceDataService) that owns the table and exposes it via API. All other services query it. Simple, single source of truth, but adds network latency and a dependency.

  2. Replicate to each service — Each service gets its own copy of the reference data, synchronized periodically. Eliminates cross-service runtime calls, but introduces eventual consistency for the reference data itself.

  3. Shared schema — Allow multiple services to read from the same database schema for this table (a pragmatic exception to per-service databases, acceptable for truly static data).

Example (Sysops Squad): Zip code lookup data is needed by the Customer service, the Ticket Routing service, and the Billing service. None of them “owns” it in a business sense — it’s reference data. The team creates a small shared LocationReferenceService.

[zip_code table]  ◀── owns ──  LocationReferenceService
       │
  read API
  ┌────┴─────────────────────────────┐
  ▼                  ▼               ▼
Customer Svc    TicketRouting Svc   Billing Svc

Scenario 3 — Joint Ownership

Definition: Two or more services both need to write to the same table (or overlapping columns of the same table). This is the hardest case.

Why it’s hard: You cannot simply assign ownership to one service without the other losing write access. You cannot share the table without coupling the services at the database level, which defeats the purpose of decomposition. Any solution involves a trade-off.

Example (Sysops Squad): The Ticket service and the Assignment service both write to the ticket table. The Ticket service writes ticket creation data (customer, description, priority). The Assignment service writes assignment data (assigned expert, scheduling). They cannot both “own” the table, and the table holds interleaved data from both domains.

The book offers four resolution techniques, described in the next section.


Joint Ownership Resolution Techniques

Technique 1 — Table Split

Concept: Split the shared table into two separate tables, each owned by one service. Each service writes only to its own table. If the data needs to be joined, it is joined at the API layer or via an event-driven approach.

Mechanism:

  • Analyze which columns each service writes to.
  • Split the table along those column boundaries.
  • Each service gets a dedicated table with a shared primary key (e.g., ticket_id).
BEFORE (shared table):
┌──────────────────────────────────────────────────────┐
│  ticket table                                        │
│  ticket_id | cust_id | desc | priority | expert_id   │
│            ← Ticket Svc ──▶ | ← Assignment Svc ──▶  │
└──────────────────────────────────────────────────────┘

AFTER (table split):
┌──────────────────────────────┐   ┌──────────────────────────────┐
│  ticket table (Ticket Svc)   │   │ assignment table (Assign Svc) │
│  ticket_id | cust_id | desc  │   │ ticket_id | expert_id | sched │
│  | priority                  │   │                               │
└──────────────────────────────┘   └──────────────────────────────┘
       shared primary key (ticket_id)

Trade-offs:

AspectAssessment
Data couplingLow — clean separation
Service couplingLow — services are independent
Data integrityHarder — referential integrity across tables requires application-level enforcement
Query complexityHigher — joins must happen at API/aggregation layer
When to useWhen the columns map cleanly to service boundaries

Fits well when: Each service writes to a distinct, non-overlapping set of columns. If services write to the same column (e.g., a shared status field), a table split alone is insufficient.


Technique 2 — Data Domain

Concept: Rather than splitting the table, create a shared data domain — a dedicated schema or database explicitly shared by the services that need joint write access. This is a controlled exception to the “one database per service” rule.

Mechanism:

  • Define a data domain (e.g., TicketDomain) that contains the contested table(s).
  • Both Ticket service and Assignment service have read/write access to this domain.
  • The domain is explicitly documented as shared; no other services access it.
┌─────────────────────────────────────────────────────┐
│             Ticket Data Domain (shared)              │
│  ┌──────────────────────────────────────────────┐   │
│  │              ticket table                    │   │
│  └──────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘
          ▲                        ▲
    writes/reads              writes/reads
          │                        │
  ┌───────────────┐       ┌───────────────────┐
  │  Ticket Svc   │       │  Assignment Svc   │
  └───────────────┘       └───────────────────┘

Trade-offs:

AspectAssessment
Data couplingHigh — both services share the same schema
Service couplingMedium — shared database creates deployment coupling
Data integrityGood — full ACID within the shared domain
Query complexityLow — standard SQL joins work
Operational overheadLow — no cross-service API calls for writes
When to useWhen data integrity is paramount and the sharing scope is strictly bounded

Risk: The shared data domain can grow — other services start “borrowing” access, gradually recreating the monolith’s shared database. Governance is required to keep the domain boundary tight.

Fits well when: The two services are tightly related (perhaps candidates for consolidation), data integrity constraints are strict, and you want to avoid eventual consistency complexity.


Technique 3 — Delegate

Concept: Designate one service as the owner of the table. The other service that previously wrote to it must now delegate all writes through the owning service’s API. The non-owning service sends write requests; it never touches the database directly.

Mechanism:

  • Assign ownership to the service whose core domain the table best represents.
  • The non-owning service calls the owning service’s write API whenever it needs to modify the data.
  • The owning service enforces business rules, validation, and consistency.
BEFORE: both services write to table directly

AFTER (Ticket Svc owns the table):

Assignment Svc ──── write request (API call) ────▶ Ticket Svc
                                                         │
                                                     validates
                                                         │
                                                         ▼
                                                  [ticket table]

Trade-offs:

AspectAssessment
Data couplingLow — only owner touches the DB
Service couplingHigh — non-owner is runtime-dependent on the owner
Data integrityGood — owner enforces all rules
AvailabilityReduced — if owner is down, delegating service cannot write
PerformanceReduced — extra network hop for every write
When to useWhen one service is clearly the “true” owner semantically

Fits well when: Ownership is logically clear but historically both services wrote to the table for convenience. Also useful as a migration step — the delegate pattern can be introduced quickly while a longer-term solution (table split or consolidation) is designed.

Failure mode: If the owning service becomes a bottleneck or single point of failure, all dependent services are affected. Circuit breakers and retry logic become essential.


Technique 4 — Service Consolidation

Concept: If two services cannot cleanly separate their data ownership, merge them into a single service. The “joint ownership” problem disappears because there is only one service.

Mechanism:

  • Evaluate whether the two services have high semantic cohesion despite being split.
  • If yes, merge them into one service with one database.
  • The merged service owns all the data and handles all the writes internally.
BEFORE:
┌───────────────┐     ┌───────────────────┐
│  Ticket Svc   │     │  Assignment Svc   │
│  (writes to   │     │  (writes to       │
│   ticket tbl) │     │   ticket tbl)     │
└───────────────┘     └───────────────────┘

AFTER (consolidated):
┌────────────────────────────────────────┐
│        Ticket Management Service       │
│  (handles creation + assignment        │
│   internally; owns ticket table)       │
└────────────────────────────────────────┘
                    │
                    ▼
             [ticket table]

Trade-offs:

AspectAssessment
Data couplingNone — single owner
Service couplingNone — single service
GranularityIncreases service size (integrator force)
DeployabilityMerged service is larger, harder to deploy independently
ScalabilityMust scale the merged service as a whole
When to useWhen the two services are more cohesive than they appear; when other techniques are too complex

Fits well when: The services are so tightly coupled operationally and semantically that keeping them separate creates more problems than it solves. This technique is a recognition that the original decomposition was too granular.

Key insight: Service consolidation is not a failure — it is a valid granularity integrator. See ch07-service-granularity for the disintegrator/integrator framework.


Choosing Among the Four Techniques

Is there a clean column boundary?
         YES ──▶ Table Split
         NO ──▶
              Is data integrity across writes critical?
                   YES ──▶ Data Domain (controlled shared DB)
                   NO ──▶
                        Is one service the semantic owner?
                             YES ──▶ Delegate
                             NO ──▶ Service Consolidation

The book also frames it in terms of coupling tolerance: if you can tolerate higher service coupling, Delegate works. If you need low coupling but high integrity, Data Domain. If neither constraint dominates, Table Split or Consolidation depending on column alignment and cohesion.


Distributed Transactions: The Problem

Why ACID Breaks Across Services

In a monolith with a single relational database, ACID guarantees are free:

  • Atomicity: All operations in a transaction commit or all roll back.
  • Consistency: The database enforces integrity constraints across all tables.
  • Isolation: Concurrent transactions don’t see each other’s partial state.
  • Durability: Committed data survives failures.

In a distributed architecture, each service has its own database. A business operation spanning two services (e.g., creating a ticket AND updating the customer’s open-ticket count) must touch two databases. There is no native mechanism to make this atomic.

Concrete failure scenario:

Step 1: Ticket Service creates ticket record in tickets DB  ✓
Step 2: Customer Service increments open_tickets in customers DB  ✗ (crash)

Result: Ticket exists, but customer count is wrong.
        System is inconsistent. No rollback of Step 1.

Why Two-Phase Commit (2PC) Doesn’t Scale

2PC is the classical distributed transaction protocol. It works in two phases:

Phase 1 — Prepare:

  • A coordinator sends PREPARE to all participant databases.
  • Each participant writes the pending changes to a durable log and responds READY or ABORT.

Phase 2 — Commit:

  • If all participants sent READY, coordinator sends COMMIT to all.
  • If any sent ABORT, coordinator sends ROLLBACK to all.
Coordinator
    │
    ├──PREPARE──▶ DB1 ──READY──▶ │
    ├──PREPARE──▶ DB2 ──READY──▶ │
    │                             │
    │◀─── all READY ──────────────┘
    │
    ├──COMMIT──▶ DB1
    └──COMMIT──▶ DB2

Problems with 2PC in microservices:

  1. Blocking protocol: If the coordinator crashes after sending PREPARE but before sending COMMIT, all participants are blocked indefinitely — they have locks held, cannot commit or roll back, and must wait for the coordinator to recover. This is the blocking problem.

  2. Synchronous coupling: 2PC requires all participants to be available simultaneously. In a microservices environment with many services, the probability that all are healthy at the same instant decreases with the number of participants.

  3. Performance: All participants hold locks through both phases. Under high concurrency, this creates severe contention.

  4. Doesn’t cross HTTP boundaries cleanly: 2PC was designed for database-level coordination (XA protocol). Applying it across HTTP-communicating microservices requires XA-aware resource managers in each service — extremely rare in practice.

  5. Tight coupling: The coordinator and all participants are tightly coupled — a failure in any participant can stall the entire transaction.

Conclusion: 2PC is not a viable general solution for distributed microservices transactions. The book’s answer is eventual consistency managed through one of three patterns.


Eventual Consistency Patterns

The key conceptual shift: instead of trying to make distributed operations atomic (all succeed or all fail instantly), accept that consistency will be achieved eventually — and design the system to detect, handle, and recover from transient inconsistencies.

The three patterns differ in who drives the synchronization, how failures are handled, and what the coupling trade-offs are.


Pattern 1 — Background Synchronization

Core idea: Each service performs its local operation independently. A separate background process (batch job, scheduler, or reconciliation service) periodically checks for inconsistencies and corrects them.

How it works:

Time 0:  Ticket Service creates ticket            [tickets DB: ticket_id=42, open]
Time 0:  Customer Service receives no update      [customers DB: open_count still old]
...
Time T:  Background Sync Process runs
         - Queries tickets DB: find tickets created since last run
         - For each ticket, checks customers DB
         - Detects open_count discrepancy
         - Issues corrective write to customers DB
Time T:  customers DB is now consistent           [customers DB: open_count correct]

Concrete example (Sysops Squad): When a ticket is created in the Ticket service, the Customer service’s open ticket count should increment. With background synchronization, a nightly (or hourly) reconciliation job scans for tickets created without a corresponding customer count update and applies corrections.

Sequence diagram:

Client ──▶ Ticket Svc ──▶ [tickets DB]   (write succeeds)
                                          
... time passes ...

BackgroundSync ──▶ [tickets DB]           (reads new tickets)
BackgroundSync ──▶ [customers DB]         (checks counts)
BackgroundSync ──▶ [customers DB]         (corrects discrepancy)

Failure modes:

  • Stale reads: Between the service write and the next sync run, the data is inconsistent. Queries during this window return stale data.
  • Sync job failure: If the background job itself fails, inconsistency persists until the job recovers. Requires monitoring and alerting.
  • Race conditions: If a customer is updated both by a service and the sync job simultaneously, write conflicts can occur. Requires idempotent corrections.
  • Detection lag: The longer the sync interval, the longer inconsistency persists.

Trade-offs:

AspectAssessment
Architectural complexityLow — simple pattern, no changes to services
Service couplingVery low — services know nothing of each other
Data consistencyEventual, with configurable lag (minutes to hours)
Fault tolerancePoor — sync job is a SPOF; inconsistency during failure
ScalabilityGood — services are independent
ResponsivenessHigh — no synchronous coordination
Best forLow-volume, low-criticality data with high tolerance for staleness

When to use: Reporting databases, analytics aggregations, non-critical summaries (e.g., nightly reconciliation of financial summaries). Not suitable when users need to see consistent data immediately after an operation.


Pattern 2 — Orchestrated Request-Based Pattern

Core idea: A central orchestrator service calls participating services in sequence over HTTP/gRPC. If a step fails, the orchestrator issues compensating transactions to undo the steps that already succeeded.

How it works:

Orchestrator
    │
    ├──1. Create ticket──▶ Ticket Service ──▶ [tickets DB]   ✓
    │◀────────────────────── ticket_id=42 ───────────────────
    │
    ├──2. Increment count──▶ Customer Service ──▶ [customers DB]  ✗ (fails)
    │◀─────────────────────── FAILURE ─────────────────────────
    │
    ├──COMPENSATE: Delete ticket──▶ Ticket Service ──▶ [tickets DB]  ✓
    │◀────────────────────────────── OK ────────────────────────────
    │
    └──Return error to client

Concrete example (Sysops Squad): A ticket creation workflow orchestrator:

  1. Calls Ticket Service — creates the ticket record.
  2. Calls Customer Service — increments open ticket count.
  3. Calls Assignment Service — routes to appropriate expert.

If step 2 fails, the orchestrator calls Ticket Service with a delete/cancel compensating transaction to undo step 1. If step 3 fails, it compensates steps 1 and 2.

Compensating transaction design is critical: compensations must be idempotent (can be called multiple times safely) and must handle the case where the original operation partially succeeded.

Failure modes:

  • Orchestrator failure: If the orchestrator crashes mid-workflow, in-progress transactions are stranded. Requires persistent workflow state (durable orchestration log) to resume. Addressed more fully in ch11-managing-distributed-workflows and ch12-transactional-sagas.
  • Compensation failure: If a compensating transaction also fails, the system is in an inconsistent state that requires manual intervention or a retry mechanism.
  • Partial compensation: Complex workflows with many steps require many compensating transactions, each of which must be carefully designed.
  • Retry storms: Under failure, aggressive retry logic from the orchestrator can overwhelm downstream services.

Trade-offs:

AspectAssessment
Architectural complexityMedium — orchestrator must manage state and compensation logic
Service couplingMedium — orchestrator is coupled to all participating services
Data consistencyStrong eventual — inconsistency window is the duration of the workflow
Fault toleranceMedium — orchestrator is a SPOF; compensations can fail
ScalabilityMedium — orchestrator can become a bottleneck
ResponsivenessMedium — synchronous calls add latency
Best forMulti-step workflows requiring a defined failure recovery path

When to use: Business workflows where the sequence of operations is known in advance, failure recovery logic is definable, and some latency is acceptable. More fully developed as the Saga pattern in ch12-transactional-sagas.


Pattern 3 — Event-Based Pattern

Core idea: Services publish domain events to a message broker. Other services subscribe and react to those events asynchronously. No synchronous coordination. Each service maintains its own consistency by consuming events.

How it works:

Ticket Service ──▶ publishes "TicketCreated" event ──▶ [Message Broker]
                                                              │
                                         ┌────────────────────┤
                                         ▼                    ▼
                                 Customer Service      Assignment Service
                                  (subscribes to       (subscribes to
                                   TicketCreated)       TicketCreated)
                                         │                    │
                                         ▼                    ▼
                                  increments count      routes to expert
                                  [customers DB]        [assignments DB]

Concrete example (Sysops Squad): When a ticket is created:

  1. Ticket Service writes the ticket locally and publishes a TicketCreated event.
  2. Customer Service consumes the event and increments the customer’s open ticket count.
  3. Assignment Service consumes the event and triggers routing logic.

Both downstream services process the event independently. If Customer Service is temporarily down, the event remains in the broker queue until the service recovers.

Failure modes:

  • Event loss: If the broker loses an event (no persistence), downstream services never get notified. Requires a durable message broker (Kafka, RabbitMQ with persistence).
  • Out-of-order events: Events may arrive out of sequence. Services must handle idempotency and ordering.
  • Consumer failure after processing: The consumer processes the event but crashes before acknowledging. The broker redelivers — the consumer must handle duplicate events idempotently.
  • No rollback semantics: If Customer Service processes a TicketCreated event and later the ticket is discovered to be invalid, a compensating TicketCancelled event must be published. Compensations are event-driven, not procedural.
  • Debugging complexity: Tracing the cause of an inconsistency across asynchronous event streams is significantly harder than debugging synchronous flows.

Trade-offs:

AspectAssessment
Architectural complexityHigh — event schema design, broker management, idempotency, ordering
Service couplingVery low — services only share event contracts, not direct dependencies
Data consistencyEventual — lag depends on broker throughput and consumer availability
Fault toleranceHigh — broker buffers events; consumers recover independently
ScalabilityHigh — services scale independently; broker scales horizontally
ResponsivenessHigh for publisher — fire and forget; no waiting for downstream
Best forHigh-throughput, loosely coupled systems where eventual consistency is acceptable

When to use: High-volume operations, fan-out notifications (one event triggers many consumers), systems where decoupling is paramount, and where the engineering team is equipped to handle asynchronous debugging complexity. See ch11-managing-distributed-workflows for choreography patterns built on events.


Pattern Comparison

DimensionBackground SyncOrchestrated Request-BasedEvent-Based
Who drives syncBatch/reconciliation jobCentral orchestratorMessage broker + consumers
Communication stylePolling (pull)Synchronous request/responseAsynchronous publish/subscribe
Consistency lagHigh (minutes–hours)Low (seconds)Medium (milliseconds–seconds)
Service couplingVery lowMediumVery low
Architectural complexityLowMediumHigh
Fault toleranceLow (sync job SPOF)Medium (orchestrator SPOF; compensations)High (broker buffers; consumers recover)
Failure recoveryNext sync run correctsCompensating transactionsCompensating events
ScalabilityGoodMedium (orchestrator bottleneck)High
DebuggingEasyMediumHard (distributed traces needed)
Idempotency requiredYes (sync corrections)Yes (compensations)Yes (duplicate event delivery)
Best use caseReporting, analytics, low-criticality reconciliationMulti-step business workflowsHigh-throughput, fan-out, loosely coupled domains
Example scenarioNightly billing reconciliationOrder creation with payment and inventory stepsTicket creation notifying multiple downstream services

No universal winner: The right choice depends on consistency requirements, volume, team capability, and tolerance for complexity. Many real systems combine patterns — e.g., event-based for high-volume paths, orchestrated for critical financial transactions.


Decision Framework

Step 1 — Identify the ownership scenario

Does only one service write to this data?
   YES ──▶ Single Ownership. No further action needed.
   NO ──▶ Do multiple services need to READ (but only one writes)?
      YES ──▶ Common Ownership. Use shared reference service or replication.
      NO ──▶ Multiple services WRITE ──▶ Joint Ownership. Go to Step 2.

Step 2 — Resolve joint ownership

Can you split columns cleanly by service boundary?
   YES ──▶ Table Split technique.
   NO ──▶ Is data integrity across writes non-negotiable?
      YES ──▶ Data Domain technique (shared schema, controlled scope).
      NO ──▶ Is one service the semantic owner?
         YES ──▶ Delegate technique.
         NO ──▶ Are these services more cohesive than they appear?
            YES ──▶ Service Consolidation.
            NO ──▶ Re-examine domain boundaries (decomposition may be wrong).

Step 3 — Choose an eventual consistency pattern

How long can data be inconsistent?
   Hours acceptable ──▶ Background Synchronization.
   Minutes/seconds acceptable ──▶ Go to next question.

Is the workflow a defined sequence of steps with known failure recovery?
   YES ──▶ Orchestrated Request-Based (Saga).
   NO ──▶ Go to next question.

Is high throughput, low coupling, or fan-out notification required?
   YES ──▶ Event-Based pattern.
   Uncertain ──▶ Evaluate team capability for async complexity first.

Additional factors

FactorFavors
Team inexperienced with asyncOrchestrated or Background Sync
High write throughputEvent-based
Strict audit trail neededOrchestrated (explicit state)
Many downstream consumers per eventEvent-based
Hard deadline for consistencyOrchestrated
Low operational overheadBackground Sync
Broker infrastructure already in placeEvent-based

Sysops Squad Saga

The Sysops Squad case study in Chapter 9 follows the ticket processing workflow as the team works out who owns what.

The Problem

The monolith’s ticket table is written by several components now separated into distinct services:

  • Ticket service — creates and closes tickets
  • Assignment service — assigns experts and schedules appointments
  • Survey service — creates follow-up surveys after ticket resolution

After decomposition, all three services find themselves wanting to write to overlapping data.

Resolution Applied

Survey service — the team identifies that only Survey service writes to the survey table. This is single ownership. No change needed.

Zip code lookup — needed by Ticket, Customer, and Assignment services for routing. Nobody “owns” it. This is common ownership. Resolved by creating a small LocationReferenceService that all three query.

Ticket + Assignment services and the ticket table — this is the hard case: joint ownership. The team analyzes the columns:

  • Ticket service writes: ticket_id, cust_id, description, priority, created_at, status
  • Assignment service writes: assigned_expert, scheduled_date, completion_notes

Because the column boundaries are relatively clean, the team applies the Table Split technique:

  • ticket table stays with Ticket service (creation and lifecycle).
  • New ticket_assignment table is created, owned by Assignment service, linked by ticket_id.

The Distributed Transaction

After the split, creating a ticket and assigning it becomes a distributed transaction. The team evaluates:

  • Background sync is too slow — customers need to see their ticket assignment immediately.
  • Event-based is attractive but the team doesn’t yet have a mature broker infrastructure.
  • Orchestrated request-based is chosen for the initial implementation: a lightweight orchestrator calls Ticket service then Assignment service, with compensating transactions (delete the ticket if assignment fails).

This is a deliberate pragmatic choice — the team notes they may migrate to event-based as their event infrastructure matures.


Key Takeaways

  1. Data ownership must be explicit: Every table in a distributed system must have exactly one service responsible for writing to it. Ambiguity causes corruption and coupling. The first step in distributed data design is to assign ownership clearly.

  2. The three scenarios form a difficulty ladder: Single ownership is trivial; common ownership is manageable; joint ownership is the real challenge and requires deliberate resolution.

  3. Joint ownership has four resolution techniques: Table Split (column partition), Data Domain (controlled shared schema), Delegate (one service owns, others call its API), and Service Consolidation (merge the services). Each is a trade-off between coupling, integrity, and complexity.

  4. 2PC does not scale in microservices: The blocking protocol, synchronous coupling, and lock contention make 2PC impractical for HTTP-based distributed services. Eventual consistency is the pragmatic alternative.

  5. Eventual consistency is a spectrum: Background synchronization has the highest lag but the least complexity. Orchestrated patterns offer tighter consistency windows at the cost of a SPOF orchestrator. Event-based patterns offer the best decoupling and scalability but the highest implementation complexity.

  6. Compensating transactions are not rollbacks: They are forward-moving operations that logically undo a previous step. They must be idempotent and explicitly designed — they are not automatic.

  7. The event-based pattern requires idempotent consumers: Because message brokers may redeliver messages (at-least-once delivery), every consumer must safely handle duplicate events without double-counting or corrupting data.

  8. Chapter 9 patterns are foundational for Chapters 11 and 12: The orchestrated request-based pattern evolves into full Saga patterns in ch12-transactional-sagas. The event-based pattern grounds the choreography workflow style in ch11-managing-distributed-workflows.

  9. The Sysops Squad saga shows pragmatic evolution: The team doesn’t implement the theoretically ideal solution; they choose what fits their current infrastructure and team capability, with a documented migration path toward event-based when ready.

  10. No one-size-fits-all: The same distributed system may use all three eventual consistency patterns in different subsystems, selected based on criticality, volume, and consistency requirements of each flow.



Last Updated: 2026-05-30