Chapter 10: Distributed Data Access

saht distributed-data data-access microservices inter-service-communication

Status: Notes complete

Overview

When a system is decomposed into separate services, each service ideally owns its own data — its own database, its own schema, its own tables. This is the principle of database-per-service, and it provides the team autonomy, independent deployability, and bounded failure isolation that microservices promise. But almost immediately, a new problem emerges: what happens when Service A needs data that Service B owns?

Chapter 10 addresses this fundamental challenge of distributed architectures: how do services access data they do not own, without recreating the shared-database coupling that decomposition was meant to eliminate?

The chapter presents four distinct patterns — each a different resolution to the tension between data encapsulation and operational necessity. None is universally superior. The authors analyze each through the lens of runtime coupling, data consistency, read performance, write complexity, and operational overhead, then provide guidance on when each is the right choice.

The Sysops Squad Saga in this chapter focuses on the ticket assignment feature, which requires the Assignment service to access Expert data (skills, geography, availability) that is owned by the Expert Profile service.

Core Concepts

Data ownership: The principle that each service is the single authoritative source for the data it manages. The owning service defines the schema, controls writes, and is responsible for consistency of that data. No other service should write directly to the owner’s database.

Shared database anti-pattern: Multiple services accessing the same database schema directly. This creates structural coupling — a schema change by one service can break another. It was a common pattern in SOA and a known problem the book is explicitly moving away from.

Runtime coupling: Dependency between services that exists at runtime. If Service A calls Service B’s API to get data, and Service B is down, Service A’s functionality is impaired or unavailable. Runtime coupling reduces overall system availability.

Data consistency: The guarantee that all services see the same data values. In a distributed system, there is tension between strong consistency (everyone always sees the latest value) and eventual consistency (values propagate with some delay but eventually converge).

Read performance: How quickly a service can retrieve the data it needs. Network calls to another service introduce latency; in-memory or local database reads are faster by orders of magnitude.

Write complexity: How difficult it is to keep data in sync when it is replicated or cached across multiple locations. Any pattern that involves data duplication introduces write complexity.

The Four Distributed Data Access Patterns

Pattern 1: Interservice Communication Pattern

What it is: Service A retrieves data from Service B by calling Service B’s API directly — a synchronous REST, gRPC, or GraphQL call at runtime when Service A needs the data.

                    +-------------------+
                    |    Service A      |
                    |  (e.g., Ticket    |
                    |   Assignment)     |
                    +--------+----------+
                             |
                   GET /experts/{id}
                   (synchronous call)
                             |
                    +--------v----------+
                    |    Service B      |
                    |  (e.g., Expert    |
                    |   Profile)        |
                    +--------+----------+
                             |
                    +--------v----------+
                    |  Expert Profile   |
                    |    Database       |
                    +-------------------+

How it works:

Service A has a runtime dependency on Service B’s availability
Service A receives exactly the data Service B exposes via its API — no more, no less
Service B remains the single source of truth; no data duplication occurs
Calls can be synchronous (request/response) or asynchronous (request/async callback)

Trade-offs:

Dimension	Assessment
Runtime coupling	HIGH — Service A’s behavior depends on Service B being available
Data consistency	HIGH — Service A always gets the current value from the source of truth
Read performance	LOWER — network latency per call; adds to Service A’s response time
Write complexity	LOW — no synchronization needed; single source of truth
Operational complexity	LOW to MEDIUM — standard API contract, but fault tolerance required

When to use:

Data changes frequently (high write velocity) — replication would create stale reads
Strict consistency is required — the consumer must see the latest value immediately
The data volume requested is small per call — a few fields, not megabytes
Service B is highly available and fast (not a bottleneck)
The dependency relationship is acceptable in the architecture’s coupling model

When to avoid:

Service B is a known bottleneck or has reliability concerns
Service A is on a hot read path where network latency is unacceptable
The data changes infrequently — network call overhead is wasted on static data

Fault tolerance requirements: Service A must handle Service B being unavailable — circuit breakers, fallback behavior, timeout policies, and retry logic are all required. This adds implementation complexity that is often underestimated.

Pattern 2: Column Schema Replication Pattern

What it is: Service A replicates the specific columns it needs from Service B’s data into its own local database. A synchronization mechanism keeps the replica up to date.

  Service B (Expert Profile)          Service A (Ticket Assignment)
  +-----------------------+           +--------------------------+
  | expert_profile table  |           | expert_local_copy table  |
  | - expert_id (PK)      |  sync  -->| - expert_id              |
  | - name                |---------->| - skill_category         |
  | - email               |           | - geography_zone         |
  | - skill_category      |           | - availability_flag      |
  | - geography_zone      |           +--------------------------+
  | - availability_flag   |           | Assignment DB            |
  | - bio                 |
  | - profile_photo       |
  +-----------------------+
  | Expert Profile DB     |

How it works:

Only the columns Service A actually uses are replicated — not the entire table
Synchronization can be event-driven (Service B publishes change events that Service A consumes) or scheduled (periodic polling/batch sync)
Service A reads from its local copy — no runtime dependency on Service B for read operations
Service B remains the write authority; Service A’s replica is read-only

Trade-offs:

Dimension	Assessment
Runtime coupling	LOW for reads — Service A reads locally; coupling only for sync mechanism
Data consistency	EVENTUAL — replica lags behind the source by the sync interval or event propagation delay
Read performance	HIGH — local database reads, no network hop to another service
Write complexity	MEDIUM to HIGH — sync mechanism must be reliable, handle failures, and manage schema changes
Operational complexity	MEDIUM — sync infrastructure must be maintained; schema changes in Service B propagate carefully

When to use:

Data access is frequent and read-heavy — local reads justify the sync complexity
The data is relatively stable (low write velocity) — sync lag is less problematic when data rarely changes
Eventual consistency is acceptable — consumer can tolerate seeing slightly stale values
The subset of needed columns is small and well-defined — keeping the replica narrow reduces sync burden

When to avoid:

Data changes very frequently — the replica will often be stale, and sync events can overwhelm the consumer
Strict consistency is required — any lag is unacceptable
The schema of Service B’s data changes often — each change requires coordination with all replicas

Synchronization mechanisms:

Event-driven: Service B publishes domain events (e.g., ExpertAvailabilityChanged) to a message broker; Service A subscribes and updates its replica. Low latency but requires reliable messaging infrastructure.
Scheduled batch sync: A periodic job queries Service B’s API or database and updates Service A’s replica. Simple but introduces predictable lag equal to the batch interval.
Change Data Capture (CDC): A tool (Debezium, Maxwell’s Daemon) reads Service B’s database transaction log and publishes changes as events. Near-real-time and decoupled from Service B’s application code.

Pattern 3: Replicated Caching Pattern

What it is: A shared in-memory distributed cache (Redis, Hazelcast, Apache Ignite) holds a copy of the data. All services that need the data read from the cache rather than calling each other or hitting their own local databases.

                    +--------------------+
                    |   Service B        |
                    |   (Expert Profile) |
                    +--------+-----------+
                             |
                      writes + updates
                             |
                    +--------v-----------+
                    | Distributed Cache  |
                    | (Redis / Hazelcast)|
                    | expert_availability|
                    | expert_skills      |
                    +--------+-----------+
                             |
               +-------------+-------------+
               |                           |
     +---------v--------+       +----------v--------+
     |   Service A      |       |   Service C       |
     | (Ticket Assign.) |       | (Notifications)   |
     +------------------+       +-------------------+

How it works:

Service B (the data owner) is responsible for populating and invalidating the cache
Services A, C, and any other consumers read from the cache without calling Service B
Cache entries typically have a TTL (time-to-live) as a backstop for stale data
If the cache misses (entry expired or evicted), the consumer may call Service B directly as a fallback or return a degraded response

Trade-offs:

Dimension	Assessment
Runtime coupling	LOW for consumers — reads hit the cache, not Service B
Data consistency	EVENTUAL to STRONG depending on cache invalidation strategy and TTL
Read performance	VERY HIGH — in-memory reads are sub-millisecond; can exceed local database reads
Write complexity	HIGH — cache invalidation is notoriously difficult; must invalidate consistently across all nodes
Operational complexity	HIGH — distributed cache cluster must be deployed, scaled, monitored, and tuned

When to use:

Data is read extremely frequently (hot read path) — cache hit rates are high
Low latency is paramount — in-memory reads are the fastest option
Data changes infrequently relative to reads — high read-to-write ratio
The dataset fits in memory (or at least the hot subset does)
Multiple services need the same data simultaneously

When to avoid:

Data changes frequently — cache invalidation becomes the primary challenge
The dataset is too large for memory — cache eviction causes frequent misses
Cache consistency requirements are strict — distributed caches can serve stale data
Operational budget does not support running and maintaining a cache cluster

Cache invalidation strategies:

TTL expiry: Simplest; accept that data is stale for up to TTL seconds. Works for non-critical data.
Write-through: Service B writes to cache and database simultaneously. Cache is always current but adds latency to writes.
Event-driven invalidation: Service B publishes a change event; a cache updater subscribes and updates/invalidates the cache entry. Near-real-time but requires messaging infrastructure.
Cache-aside with version check: Consumer fetches from cache; on write, Service B bumps a version counter; consumers check version before using cached value.

Phil Karlton’s famous observation: “There are only two hard things in Computer Science: cache invalidation and naming things.” This chapter demonstrates why cache invalidation is genuinely hard at architectural scale, not just at implementation scale.

Pattern 4: Data Domain Pattern

What it is: Instead of each service having a completely isolated database, a subset of services that frequently share data are grouped into a shared data domain — a common schema or database that multiple services within the same domain can read from directly.

  ┌─────────────────────────────────────────────┐
  │              Expert Data Domain              │
  │                                              │
  │  +-----------------+  +------------------+  │
  │  | Expert Profile  |  | Ticket Assign.   |  │
  │  |    Service      |  |    Service       |  │
  │  +--------+--------+  +--------+---------+  │
  │           |                    |             │
  │           +--------+  +--------+             │
  │                    |  |                      │
  │           +--------v--v--------+             │
  │           |   Shared Expert    |             │
  │           |   Domain Database  |             │
  │           | (schema-level      |             │
  │           |  isolation, not    |             │
  │           |  full DB isolation)|             │
  │           +--------------------+             │
  └─────────────────────────────────────────────┘
         ↑
         No sharing across domain boundaries

How it works:

Services within the domain share a database or schema but have agreed-upon boundaries on which tables they own and which they only read
Write authority remains with the owning service — the Expert Profile service owns writes to expert_profile; the Assignment service reads it directly but never writes it
Schema changes still require coordination between all services in the domain — this is the key coupling trade-off
The domain boundary prevents services from other domains from accessing the shared schema directly

Trade-offs:

Dimension	Assessment
Runtime coupling	LOW — direct database reads, no service-to-service call required
Data consistency	HIGH — reads always see committed data; no sync lag
Read performance	HIGH — direct local DB read within same infrastructure
Write complexity	LOW — single source of truth within the domain; no synchronization
Operational complexity	MEDIUM — shared schema introduces schema change coordination burden

The critical coupling trade-off:
The Data Domain Pattern trades service independence for read simplicity. Once services share a schema, they are coupled at the static structural level — not at runtime (no network call), but at the schema contract level. A schema change to a shared table requires coordinated deployment of all services that read it. This is precisely the kind of coupling that decomposition was meant to eliminate.

When to use:

Services are in the same bounded context (domain) and managed by the same team — schema coordination cost is low
Strong consistency is required and eventual consistency is not acceptable
The services are at a similar maturity level and deploy together — the schema coupling is tolerable
You are doing a phased migration from a monolith: the shared domain schema is a stepping stone toward full isolation

When to avoid:

Services are owned by different teams — schema coordination becomes organizational friction
You need true deployment independence — schema coupling makes that impossible
Services are evolving at very different rates — one service’s schema evolution blocks others

Comparative Trade-off Analysis

Pattern	Runtime Coupling	Data Consistency	Read Performance	Write Complexity	Operational Complexity
Interservice Communication	HIGH	STRONG	LOWER (network latency)	LOW	MEDIUM
Column Schema Replication	LOW	EVENTUAL	HIGH (local DB)	MEDIUM-HIGH	MEDIUM
Replicated Caching	LOW	EVENTUAL-STRONG	VERY HIGH (in-memory)	HIGH	HIGH
Data Domain	NONE (for reads)	STRONG	HIGH (local DB)	LOW	MEDIUM

Availability Impact

When a service calls another service’s API, the caller’s availability is bounded by the callee’s availability. For example, if Service B is 99.9% available, Service A (which calls B) cannot exceed 99.9% availability for operations that require B’s data. This is the availability multiplication problem:

System availability = Product of all component availabilities
If Service A = 99.9% and Service B = 99.9%:
  Availability of A's operations dependent on B = 99.9% × 99.9% = 99.8%

Add Service C at 99.9%:
  = 99.9% × 99.9% × 99.9% = 99.7%

Patterns that avoid runtime coupling (Column Schema Replication, Replicated Caching, Data Domain) escape this multiplication problem.

Decision Framework

Choose Interservice Communication when:

Data changes frequently (real-time accuracy critical)
Strong consistency is a business requirement (e.g., payment amounts, inventory counts)
Service B is highly reliable and fast
The call happens in a non-critical path (background processing, batch operations)
You want simplicity and are willing to manage fault tolerance explicitly

Choose Column Schema Replication when:

Data is relatively static (changes a few times per day, not per second)
Read performance matters but full in-memory caching is not needed
You can tolerate eventual consistency
The set of columns needed is small and stable
Event-driven or CDC infrastructure is already available

Choose Replicated Caching when:

The data is on an extremely hot read path (called thousands of times per second)
Sub-millisecond read latency is required
The data set is small enough to fit in memory
Data changes infrequently relative to reads
You have operational capacity to run and tune a distributed cache cluster

Choose Data Domain when:

Services are in the same bounded context, owned by the same team
Strong consistency and direct reads are essential
You are doing a phased migration and full isolation is a future goal
Schema coordination cost is acceptable (team is co-located, processes are aligned)

Sysops Squad Saga: Data Access for Ticket Assignment

The problem: The Ticket Assignment service must assign incoming tickets to the right expert. The assignment algorithm needs:

Expert skill categories (which types of problems each expert can solve)
Expert geographic zone (where they can be dispatched)
Expert availability status (are they currently free?)

This data is owned by the Expert Profile service. How should the Assignment service access it?

What the book demonstrates:

The team evaluates all four patterns against the specific constraints of the Sysops Squad system:

Interservice Communication: Simple to implement, but expert data is read on every ticket submission — a high-frequency path. If the Expert Profile service is slow or down, ticket assignment stalls. Runtime coupling is problematic here.
Column Schema Replication: Expert availability changes frequently (every time an expert picks up or completes a job). Keeping a replica in sync in near-real-time is complex. Eventual consistency could cause wrong assignments.
Replicated Caching: High read performance solves the hot path problem. But expert availability is volatile — cache invalidation on every status change would be very frequent, reducing cache effectiveness.
Data Domain Pattern: The Expert Profile and Ticket Assignment services are in the same Sysops Squad operational domain. Using a shared schema gives the Assignment service direct, strongly consistent access to expert data without network overhead. The trade-off — schema coupling — is acceptable because these services are managed by the same team and deploy together.

Decision: The team selects the Data Domain Pattern for the expert skill and geography data (changes infrequently, strong consistency preferred) and Interservice Communication for real-time availability (must be current at the moment of assignment, and the Expert Profile service is highly available).

This hybrid approach is characteristic of the book’s pragmatic style: no single pattern is forced to cover all scenarios.

Key Takeaways

When services own their own data, accessing data across service boundaries is an architectural challenge — not a solved problem. Every pattern involves trade-offs.
The Interservice Communication pattern is the simplest but creates the tightest runtime coupling: Service A’s availability becomes dependent on Service B’s availability.
Column Schema Replication gives Service A read independence at the cost of eventual consistency and synchronization complexity — it is a good fit for slowly changing, frequently read data.
Replicated Caching offers the highest read performance (in-memory) but is the hardest to operate correctly: cache invalidation in a distributed system is a genuinely hard problem.
The Data Domain pattern achieves strong consistency and high read performance but re-introduces schema coupling — best used within a single bounded context managed by one team.
No single pattern is universally optimal. Real-world architectures often combine patterns: one for volatile data (interservice call), another for stable reference data (caching or replication).
Availability multiplication is a key reason to avoid chained synchronous calls: each dependency in a chain multiplies downward the overall availability of the dependent service.
Cache invalidation is architecturally significant — it is not merely an implementation detail. The invalidation strategy must be designed at the architecture level, not the code level.
The Data Domain pattern is a useful stepping stone during phased monolith decomposition, providing a path toward full service isolation without requiring all isolation work to happen at once.
The choice of data access pattern should be driven by the write velocity of the data, the consistency requirements of the consumer, and the operational maturity of the team — not by the pattern that is easiest to implement.

ch09-data-ownership-distributed-transactions — Establishes the data ownership principles that make distributed data access necessary; Chapter 10 builds directly on Chapter 9
ch11-managing-distributed-workflows — Workflow patterns that determine how services coordinate; data access patterns interact with workflow communication styles
ch12-transactional-sagas — When data access involves writes, sagas handle distributed transaction correctness
ch06-pulling-apart-operational-data — How data was separated into per-service ownership in the first place
ch02-coupling — The coupling taxonomy that frames runtime coupling, schema coupling, and behavioral coupling as distinct dimensions

Last Updated: 2026-05-30

Study Notes by Niladri & AI

Explorer

ch10-distributed-data-access

Chapter 10: Distributed Data Access

Overview

Core Concepts

The Four Distributed Data Access Patterns

Pattern 1: Interservice Communication Pattern

Pattern 2: Column Schema Replication Pattern

Pattern 3: Replicated Caching Pattern

Pattern 4: Data Domain Pattern

Comparative Trade-off Analysis

Availability Impact

Decision Framework

Sysops Squad Saga: Data Access for Ticket Assignment

Key Takeaways

Graph View

Table of Contents

Backlinks

Study Notes by Niladri & AI

Explorer

ch10-distributed-data-access

Chapter 10: Distributed Data Access

Overview

Core Concepts

The Four Distributed Data Access Patterns

Pattern 1: Interservice Communication Pattern

Pattern 2: Column Schema Replication Pattern

Pattern 3: Replicated Caching Pattern

Pattern 4: Data Domain Pattern

Comparative Trade-off Analysis

Availability Impact

Decision Framework

Sysops Squad Saga: Data Access for Ticket Assignment

Key Takeaways

Related Resources

Graph View

Table of Contents

Backlinks