Chapter 10: Distributed Data Access

saht distributed-data data-access microservices inter-service-communication

Status: Notes complete


Overview

When a system is decomposed into separate services, each service ideally owns its own data — its own database, its own schema, its own tables. This is the principle of database-per-service, and it provides the team autonomy, independent deployability, and bounded failure isolation that microservices promise. But almost immediately, a new problem emerges: what happens when Service A needs data that Service B owns?

Chapter 10 addresses this fundamental challenge of distributed architectures: how do services access data they do not own, without recreating the shared-database coupling that decomposition was meant to eliminate?

The chapter presents four distinct patterns — each a different resolution to the tension between data encapsulation and operational necessity. None is universally superior. The authors analyze each through the lens of runtime coupling, data consistency, read performance, write complexity, and operational overhead, then provide guidance on when each is the right choice.

The Sysops Squad Saga in this chapter focuses on the ticket assignment feature, which requires the Assignment service to access Expert data (skills, geography, availability) that is owned by the Expert Profile service.


Core Concepts

Data ownership: The principle that each service is the single authoritative source for the data it manages. The owning service defines the schema, controls writes, and is responsible for consistency of that data. No other service should write directly to the owner’s database.

Shared database anti-pattern: Multiple services accessing the same database schema directly. This creates structural coupling — a schema change by one service can break another. It was a common pattern in SOA and a known problem the book is explicitly moving away from.

Runtime coupling: Dependency between services that exists at runtime. If Service A calls Service B’s API to get data, and Service B is down, Service A’s functionality is impaired or unavailable. Runtime coupling reduces overall system availability.

Data consistency: The guarantee that all services see the same data values. In a distributed system, there is tension between strong consistency (everyone always sees the latest value) and eventual consistency (values propagate with some delay but eventually converge).

Read performance: How quickly a service can retrieve the data it needs. Network calls to another service introduce latency; in-memory or local database reads are faster by orders of magnitude.

Write complexity: How difficult it is to keep data in sync when it is replicated or cached across multiple locations. Any pattern that involves data duplication introduces write complexity.


The Four Distributed Data Access Patterns

Pattern 1: Interservice Communication Pattern

What it is: Service A retrieves data from Service B by calling Service B’s API directly — a synchronous REST, gRPC, or GraphQL call at runtime when Service A needs the data.

                    +-------------------+
                    |    Service A      |
                    |  (e.g., Ticket    |
                    |   Assignment)     |
                    +--------+----------+
                             |
                   GET /experts/{id}
                   (synchronous call)
                             |
                    +--------v----------+
                    |    Service B      |
                    |  (e.g., Expert    |
                    |   Profile)        |
                    +--------+----------+
                             |
                    +--------v----------+
                    |  Expert Profile   |
                    |    Database       |
                    +-------------------+

How it works:

  • Service A has a runtime dependency on Service B’s availability
  • Service A receives exactly the data Service B exposes via its API — no more, no less
  • Service B remains the single source of truth; no data duplication occurs
  • Calls can be synchronous (request/response) or asynchronous (request/async callback)

Trade-offs:

DimensionAssessment
Runtime couplingHIGH — Service A’s behavior depends on Service B being available
Data consistencyHIGH — Service A always gets the current value from the source of truth
Read performanceLOWER — network latency per call; adds to Service A’s response time
Write complexityLOW — no synchronization needed; single source of truth
Operational complexityLOW to MEDIUM — standard API contract, but fault tolerance required

When to use:

  • Data changes frequently (high write velocity) — replication would create stale reads
  • Strict consistency is required — the consumer must see the latest value immediately
  • The data volume requested is small per call — a few fields, not megabytes
  • Service B is highly available and fast (not a bottleneck)
  • The dependency relationship is acceptable in the architecture’s coupling model

When to avoid:

  • Service B is a known bottleneck or has reliability concerns
  • Service A is on a hot read path where network latency is unacceptable
  • The data changes infrequently — network call overhead is wasted on static data

Fault tolerance requirements: Service A must handle Service B being unavailable — circuit breakers, fallback behavior, timeout policies, and retry logic are all required. This adds implementation complexity that is often underestimated.


Pattern 2: Column Schema Replication Pattern

What it is: Service A replicates the specific columns it needs from Service B’s data into its own local database. A synchronization mechanism keeps the replica up to date.

  Service B (Expert Profile)          Service A (Ticket Assignment)
  +-----------------------+           +--------------------------+
  | expert_profile table  |           | expert_local_copy table  |
  | - expert_id (PK)      |  sync  -->| - expert_id              |
  | - name                |---------->| - skill_category         |
  | - email               |           | - geography_zone         |
  | - skill_category      |           | - availability_flag      |
  | - geography_zone      |           +--------------------------+
  | - availability_flag   |           | Assignment DB            |
  | - bio                 |
  | - profile_photo       |
  +-----------------------+
  | Expert Profile DB     |

How it works:

  • Only the columns Service A actually uses are replicated — not the entire table
  • Synchronization can be event-driven (Service B publishes change events that Service A consumes) or scheduled (periodic polling/batch sync)
  • Service A reads from its local copy — no runtime dependency on Service B for read operations
  • Service B remains the write authority; Service A’s replica is read-only

Trade-offs:

DimensionAssessment
Runtime couplingLOW for reads — Service A reads locally; coupling only for sync mechanism
Data consistencyEVENTUAL — replica lags behind the source by the sync interval or event propagation delay
Read performanceHIGH — local database reads, no network hop to another service
Write complexityMEDIUM to HIGH — sync mechanism must be reliable, handle failures, and manage schema changes
Operational complexityMEDIUM — sync infrastructure must be maintained; schema changes in Service B propagate carefully

When to use:

  • Data access is frequent and read-heavy — local reads justify the sync complexity
  • The data is relatively stable (low write velocity) — sync lag is less problematic when data rarely changes
  • Eventual consistency is acceptable — consumer can tolerate seeing slightly stale values
  • The subset of needed columns is small and well-defined — keeping the replica narrow reduces sync burden

When to avoid:

  • Data changes very frequently — the replica will often be stale, and sync events can overwhelm the consumer
  • Strict consistency is required — any lag is unacceptable
  • The schema of Service B’s data changes often — each change requires coordination with all replicas

Synchronization mechanisms:

  • Event-driven: Service B publishes domain events (e.g., ExpertAvailabilityChanged) to a message broker; Service A subscribes and updates its replica. Low latency but requires reliable messaging infrastructure.
  • Scheduled batch sync: A periodic job queries Service B’s API or database and updates Service A’s replica. Simple but introduces predictable lag equal to the batch interval.
  • Change Data Capture (CDC): A tool (Debezium, Maxwell’s Daemon) reads Service B’s database transaction log and publishes changes as events. Near-real-time and decoupled from Service B’s application code.

Pattern 3: Replicated Caching Pattern

What it is: A shared in-memory distributed cache (Redis, Hazelcast, Apache Ignite) holds a copy of the data. All services that need the data read from the cache rather than calling each other or hitting their own local databases.

                    +--------------------+
                    |   Service B        |
                    |   (Expert Profile) |
                    +--------+-----------+
                             |
                      writes + updates
                             |
                    +--------v-----------+
                    | Distributed Cache  |
                    | (Redis / Hazelcast)|
                    | expert_availability|
                    | expert_skills      |
                    +--------+-----------+
                             |
               +-------------+-------------+
               |                           |
     +---------v--------+       +----------v--------+
     |   Service A      |       |   Service C       |
     | (Ticket Assign.) |       | (Notifications)   |
     +------------------+       +-------------------+

How it works:

  • Service B (the data owner) is responsible for populating and invalidating the cache
  • Services A, C, and any other consumers read from the cache without calling Service B
  • Cache entries typically have a TTL (time-to-live) as a backstop for stale data
  • If the cache misses (entry expired or evicted), the consumer may call Service B directly as a fallback or return a degraded response

Trade-offs:

DimensionAssessment
Runtime couplingLOW for consumers — reads hit the cache, not Service B
Data consistencyEVENTUAL to STRONG depending on cache invalidation strategy and TTL
Read performanceVERY HIGH — in-memory reads are sub-millisecond; can exceed local database reads
Write complexityHIGH — cache invalidation is notoriously difficult; must invalidate consistently across all nodes
Operational complexityHIGH — distributed cache cluster must be deployed, scaled, monitored, and tuned

When to use:

  • Data is read extremely frequently (hot read path) — cache hit rates are high
  • Low latency is paramount — in-memory reads are the fastest option
  • Data changes infrequently relative to reads — high read-to-write ratio
  • The dataset fits in memory (or at least the hot subset does)
  • Multiple services need the same data simultaneously

When to avoid:

  • Data changes frequently — cache invalidation becomes the primary challenge
  • The dataset is too large for memory — cache eviction causes frequent misses
  • Cache consistency requirements are strict — distributed caches can serve stale data
  • Operational budget does not support running and maintaining a cache cluster

Cache invalidation strategies:

  • TTL expiry: Simplest; accept that data is stale for up to TTL seconds. Works for non-critical data.
  • Write-through: Service B writes to cache and database simultaneously. Cache is always current but adds latency to writes.
  • Event-driven invalidation: Service B publishes a change event; a cache updater subscribes and updates/invalidates the cache entry. Near-real-time but requires messaging infrastructure.
  • Cache-aside with version check: Consumer fetches from cache; on write, Service B bumps a version counter; consumers check version before using cached value.

Phil Karlton’s famous observation: “There are only two hard things in Computer Science: cache invalidation and naming things.” This chapter demonstrates why cache invalidation is genuinely hard at architectural scale, not just at implementation scale.


Pattern 4: Data Domain Pattern

What it is: Instead of each service having a completely isolated database, a subset of services that frequently share data are grouped into a shared data domain — a common schema or database that multiple services within the same domain can read from directly.

  ┌─────────────────────────────────────────────┐
  │              Expert Data Domain              │
  │                                              │
  │  +-----------------+  +------------------+  │
  │  | Expert Profile  |  | Ticket Assign.   |  │
  │  |    Service      |  |    Service       |  │
  │  +--------+--------+  +--------+---------+  │
  │           |                    |             │
  │           +--------+  +--------+             │
  │                    |  |                      │
  │           +--------v--v--------+             │
  │           |   Shared Expert    |             │
  │           |   Domain Database  |             │
  │           | (schema-level      |             │
  │           |  isolation, not    |             │
  │           |  full DB isolation)|             │
  │           +--------------------+             │
  └─────────────────────────────────────────────┘
         ↑
         No sharing across domain boundaries

How it works:

  • Services within the domain share a database or schema but have agreed-upon boundaries on which tables they own and which they only read
  • Write authority remains with the owning service — the Expert Profile service owns writes to expert_profile; the Assignment service reads it directly but never writes it
  • Schema changes still require coordination between all services in the domain — this is the key coupling trade-off
  • The domain boundary prevents services from other domains from accessing the shared schema directly

Trade-offs:

DimensionAssessment
Runtime couplingLOW — direct database reads, no service-to-service call required
Data consistencyHIGH — reads always see committed data; no sync lag
Read performanceHIGH — direct local DB read within same infrastructure
Write complexityLOW — single source of truth within the domain; no synchronization
Operational complexityMEDIUM — shared schema introduces schema change coordination burden

The critical coupling trade-off:
The Data Domain Pattern trades service independence for read simplicity. Once services share a schema, they are coupled at the static structural level — not at runtime (no network call), but at the schema contract level. A schema change to a shared table requires coordinated deployment of all services that read it. This is precisely the kind of coupling that decomposition was meant to eliminate.

When to use:

  • Services are in the same bounded context (domain) and managed by the same team — schema coordination cost is low
  • Strong consistency is required and eventual consistency is not acceptable
  • The services are at a similar maturity level and deploy together — the schema coupling is tolerable
  • You are doing a phased migration from a monolith: the shared domain schema is a stepping stone toward full isolation

When to avoid:

  • Services are owned by different teams — schema coordination becomes organizational friction
  • You need true deployment independence — schema coupling makes that impossible
  • Services are evolving at very different rates — one service’s schema evolution blocks others

Comparative Trade-off Analysis

PatternRuntime CouplingData ConsistencyRead PerformanceWrite ComplexityOperational Complexity
Interservice CommunicationHIGHSTRONGLOWER (network latency)LOWMEDIUM
Column Schema ReplicationLOWEVENTUALHIGH (local DB)MEDIUM-HIGHMEDIUM
Replicated CachingLOWEVENTUAL-STRONGVERY HIGH (in-memory)HIGHHIGH
Data DomainNONE (for reads)STRONGHIGH (local DB)LOWMEDIUM

Availability Impact

When a service calls another service’s API, the caller’s availability is bounded by the callee’s availability. For example, if Service B is 99.9% available, Service A (which calls B) cannot exceed 99.9% availability for operations that require B’s data. This is the availability multiplication problem:

System availability = Product of all component availabilities
If Service A = 99.9% and Service B = 99.9%:
  Availability of A's operations dependent on B = 99.9% × 99.9% = 99.8%

Add Service C at 99.9%:
  = 99.9% × 99.9% × 99.9% = 99.7%

Patterns that avoid runtime coupling (Column Schema Replication, Replicated Caching, Data Domain) escape this multiplication problem.


Decision Framework

Choose Interservice Communication when:

  • Data changes frequently (real-time accuracy critical)
  • Strong consistency is a business requirement (e.g., payment amounts, inventory counts)
  • Service B is highly reliable and fast
  • The call happens in a non-critical path (background processing, batch operations)
  • You want simplicity and are willing to manage fault tolerance explicitly

Choose Column Schema Replication when:

  • Data is relatively static (changes a few times per day, not per second)
  • Read performance matters but full in-memory caching is not needed
  • You can tolerate eventual consistency
  • The set of columns needed is small and stable
  • Event-driven or CDC infrastructure is already available

Choose Replicated Caching when:

  • The data is on an extremely hot read path (called thousands of times per second)
  • Sub-millisecond read latency is required
  • The data set is small enough to fit in memory
  • Data changes infrequently relative to reads
  • You have operational capacity to run and tune a distributed cache cluster

Choose Data Domain when:

  • Services are in the same bounded context, owned by the same team
  • Strong consistency and direct reads are essential
  • You are doing a phased migration and full isolation is a future goal
  • Schema coordination cost is acceptable (team is co-located, processes are aligned)

Sysops Squad Saga: Data Access for Ticket Assignment

The problem: The Ticket Assignment service must assign incoming tickets to the right expert. The assignment algorithm needs:

  • Expert skill categories (which types of problems each expert can solve)
  • Expert geographic zone (where they can be dispatched)
  • Expert availability status (are they currently free?)

This data is owned by the Expert Profile service. How should the Assignment service access it?

What the book demonstrates:

The team evaluates all four patterns against the specific constraints of the Sysops Squad system:

  1. Interservice Communication: Simple to implement, but expert data is read on every ticket submission — a high-frequency path. If the Expert Profile service is slow or down, ticket assignment stalls. Runtime coupling is problematic here.

  2. Column Schema Replication: Expert availability changes frequently (every time an expert picks up or completes a job). Keeping a replica in sync in near-real-time is complex. Eventual consistency could cause wrong assignments.

  3. Replicated Caching: High read performance solves the hot path problem. But expert availability is volatile — cache invalidation on every status change would be very frequent, reducing cache effectiveness.

  4. Data Domain Pattern: The Expert Profile and Ticket Assignment services are in the same Sysops Squad operational domain. Using a shared schema gives the Assignment service direct, strongly consistent access to expert data without network overhead. The trade-off — schema coupling — is acceptable because these services are managed by the same team and deploy together.

Decision: The team selects the Data Domain Pattern for the expert skill and geography data (changes infrequently, strong consistency preferred) and Interservice Communication for real-time availability (must be current at the moment of assignment, and the Expert Profile service is highly available).

This hybrid approach is characteristic of the book’s pragmatic style: no single pattern is forced to cover all scenarios.


Key Takeaways

  1. When services own their own data, accessing data across service boundaries is an architectural challenge — not a solved problem. Every pattern involves trade-offs.
  2. The Interservice Communication pattern is the simplest but creates the tightest runtime coupling: Service A’s availability becomes dependent on Service B’s availability.
  3. Column Schema Replication gives Service A read independence at the cost of eventual consistency and synchronization complexity — it is a good fit for slowly changing, frequently read data.
  4. Replicated Caching offers the highest read performance (in-memory) but is the hardest to operate correctly: cache invalidation in a distributed system is a genuinely hard problem.
  5. The Data Domain pattern achieves strong consistency and high read performance but re-introduces schema coupling — best used within a single bounded context managed by one team.
  6. No single pattern is universally optimal. Real-world architectures often combine patterns: one for volatile data (interservice call), another for stable reference data (caching or replication).
  7. Availability multiplication is a key reason to avoid chained synchronous calls: each dependency in a chain multiplies downward the overall availability of the dependent service.
  8. Cache invalidation is architecturally significant — it is not merely an implementation detail. The invalidation strategy must be designed at the architecture level, not the code level.
  9. The Data Domain pattern is a useful stepping stone during phased monolith decomposition, providing a path toward full service isolation without requiring all isolation work to happen at once.
  10. The choice of data access pattern should be driven by the write velocity of the data, the consistency requirements of the consumer, and the operational maturity of the team — not by the pattern that is easiest to implement.

Last Updated: 2026-05-30