Chapter 10 Flashcards — Distributed Data Access

flashcards saht distributed-data data-access


What core problem does Chapter 10 address in distributed architectures?
?
When services each own their own data (database-per-service pattern), how does Service A access data that Service B owns — without recreating shared-database coupling? Chapter 10 presents four patterns that resolve this tension with different trade-off profiles.


What are the four distributed data access patterns presented in Chapter 10?
?

  1. Interservice Communication Pattern — Service A calls Service B’s API at runtime
  2. Column Schema Replication Pattern — Service A replicates needed columns into its own database
  3. Replicated Caching Pattern — A shared in-memory distributed cache (Redis/Hazelcast) that all services read
  4. Data Domain Pattern — Multiple services in the same bounded context share a common schema directly

What is the Interservice Communication Pattern and what is its defining trade-off?
?
Service A retrieves data from Service B by calling Service B’s API (REST, gRPC, GraphQL) at runtime when it needs the data. The defining trade-off is runtime coupling: Service A’s availability and performance are directly bounded by Service B’s availability and latency. If Service B is slow or down, Service A is impaired for all operations requiring that data.


What is availability multiplication and why does it matter for the Interservice Communication Pattern?
?
When Service A depends on Service B, A’s availability cannot exceed B’s. If both are 99.9% available, operations requiring B have 99.9% × 99.9% = 99.8% availability. Each additional synchronous dependency further reduces the ceiling. This is why patterns that avoid runtime coupling (caching, replication, data domains) can achieve higher availability.


What is the Column Schema Replication Pattern?
?
Service A replicates the specific columns it needs from Service B’s data into its own local database. A synchronization mechanism (event-driven, CDC, or scheduled batch) keeps the replica up to date. Service A reads locally with no runtime dependency on Service B for reads. Service B remains the write authority — Service A’s replica is read-only.


What consistency model does Column Schema Replication provide, and what write complexity does it introduce?
?
Column Schema Replication provides eventual consistency — Service A’s replica lags behind the source by however long the sync mechanism takes (milliseconds for event-driven CDC, minutes for batch sync). Write complexity is medium to high: the sync mechanism must be reliable, handle failures gracefully, and propagate schema changes from Service B without breaking Service A’s replica.


What three synchronization mechanisms can be used for Column Schema Replication?
?

  1. Event-driven messaging: Service B publishes domain events (e.g., ExpertAvailabilityChanged) to a broker; Service A subscribes and updates its replica. Low latency, requires reliable messaging.
  2. Scheduled batch sync: A periodic job queries Service B and updates Service A’s copy. Simple but introduces lag equal to the batch interval.
  3. Change Data Capture (CDC): A tool (Debezium, Maxwell) reads Service B’s transaction log and publishes changes as events. Near-real-time, decoupled from Service B’s application code.

What is the Replicated Caching Pattern?
?
A shared in-memory distributed cache (Redis, Hazelcast, Apache Ignite) holds a copy of the data. Service B (the owner) is responsible for populating and invalidating the cache. All consuming services (A, C, etc.) read from the cache without calling Service B. Cache entries typically have a TTL as a backstop. On cache miss, consumers may fall back to calling Service B directly.


What is the read performance advantage of Replicated Caching, and what is its key operational challenge?
?
Read performance is the highest of all four patterns — in-memory reads are sub-millisecond, faster than both network calls and local disk-based database reads. The key operational challenge is cache invalidation: keeping the cache consistent with the source of truth in a distributed environment is notoriously difficult. Distributed cache clusters also require significant operational investment to deploy, scale, monitor, and tune.


What are the four main cache invalidation strategies for the Replicated Caching Pattern?
?

  1. TTL expiry: Accept stale data up to TTL duration. Simple but data can be stale.
  2. Write-through: Service B writes to cache and database simultaneously. Cache always current, but writes are slower.
  3. Event-driven invalidation: Service B publishes a change event; a cache updater invalidates/updates the entry. Near-real-time, requires messaging infrastructure.
  4. Cache-aside with version check: Consumers check a version counter; on mismatch, refresh from source. Prevents stale reads but adds read complexity.

What is the Data Domain Pattern?
?
Multiple services within the same bounded context share a common database schema that all of them can read directly. Write authority remains with the owning service — other services within the domain can read but not write data they do not own. The domain boundary prevents services from other domains from accessing the schema. No synchronization or caching is needed — reads are always strongly consistent.


What is the key coupling introduced by the Data Domain Pattern and why is it a trade-off?
?
The Data Domain Pattern introduces schema coupling (static structural coupling): all services sharing the schema must coordinate schema changes. A schema change in a shared table requires coordinated redeployment of all services that read it. This is precisely the structural coupling that full database-per-service decomposition was meant to eliminate. The benefit is strong consistency and fast local reads without synchronization overhead.


Which data access pattern provides the STRONGEST data consistency?
?
Both the Interservice Communication Pattern (reads from the live source of truth via API) and the Data Domain Pattern (reads directly from the owning service’s database) provide strong consistency — consumers always see the latest committed data. Column Schema Replication and Replicated Caching provide eventual consistency.


Which data access pattern has the LOWEST runtime coupling for read operations?
?
The Data Domain Pattern — consumers read directly from the shared database without any network call to another service. Column Schema Replication and Replicated Caching also have low runtime coupling for reads (local database or cache), but they still depend on the sync/invalidation mechanism being operational. Interservice Communication has the highest runtime coupling.


When should you choose the Interservice Communication Pattern over the others?
?
Choose it when: (1) data changes very frequently and staleness is unacceptable, (2) strict strong consistency is a business requirement, (3) Service B is highly available and fast, (4) the call is in a non-critical-path context (background jobs, batch processing), or (5) simplicity is valued and you are prepared to implement fault tolerance (circuit breakers, retries, timeouts) explicitly.


When should you choose Column Schema Replication?
?
Choose it when: (1) data is relatively stable (changes infrequently), (2) reads are frequent and network latency to Service B would be a bottleneck, (3) eventual consistency is acceptable for the consumer’s use case, (4) the subset of needed columns is small and well-defined, and (5) event-driven or CDC infrastructure is already in place or acceptable to add.


When should you choose Replicated Caching?
?
Choose it when: (1) the data is on an extremely hot read path (thousands of reads per second), (2) sub-millisecond read latency is required, (3) the hot dataset fits in memory, (4) reads far outnumber writes (high read-to-write ratio), and (5) the team has operational capacity to run, tune, and maintain a distributed cache cluster.


When should you choose the Data Domain Pattern?
?
Choose it when: (1) the services are in the same bounded context and managed by the same team (schema coordination is low cost), (2) strong consistency is required for reads, (3) you are doing a phased migration from a monolith and full isolation is a future goal, or (4) deployment independence between the services is not a strict requirement.


Compare all four patterns on runtime coupling and data consistency in a single summary.
?

PatternRuntime CouplingData Consistency
Interservice CommunicationHIGHSTRONG
Column Schema ReplicationLOWEVENTUAL
Replicated CachingLOWEVENTUAL–STRONG (depends on invalidation)
Data DomainNONE (for reads)STRONG

What fault tolerance measures must be implemented for the Interservice Communication Pattern?
?
Because Service A has a runtime dependency on Service B, Service A must implement: (1) circuit breakers to stop cascading failures when B is down, (2) timeouts to avoid waiting indefinitely, (3) retry logic with exponential backoff for transient failures, and (4) fallback behavior — what Service A does when it cannot reach B (return cached result, degrade gracefully, or fail fast). This implementation burden is often underestimated.


How did the Sysops Squad team apply data access patterns to the Ticket Assignment problem?
?
The Assignment service needs expert skills, geography, and availability from the Expert Profile service. The team chose a hybrid approach: (1) Data Domain Pattern for expert skill and geography data (stable, co-located team, strong consistency required), and (2) Interservice Communication for real-time availability status (must be current at the exact moment of assignment, Expert Profile service is highly reliable). No single pattern was forced to cover all scenarios.


What is “availability multiplication” and which data access patterns avoid it?
?
Availability multiplication: when services call each other synchronously, system availability is the product of all component availabilities. Two 99.9% services in a chain yield 99.8% effective availability. Patterns that avoid synchronous runtime dependency — Column Schema Replication, Replicated Caching, and Data Domain — escape this multiplication problem. Only the Interservice Communication Pattern is fully subject to it.


Why is the Data Domain Pattern described as a useful stepping stone during monolith decomposition?
?
When migrating from a monolith, extracting services while simultaneously isolating every database table is often impractical. The Data Domain Pattern allows teams to extract services into independent deployable units while letting services within the same domain continue to share a database schema. Over time, as domain boundaries stabilize and teams mature, the shared schema can be further decomposed into fully isolated per-service databases. It is a pragmatic intermediate state.


What does “write complexity” mean in the context of distributed data access patterns, and which pattern has the highest?
?
Write complexity refers to how difficult it is to keep data correct and synchronized when it exists in multiple locations. Patterns that involve replication or caching require synchronization infrastructure to propagate writes from the source of truth to copies. The Replicated Caching Pattern has the highest write complexity — not because writes to the source are complex, but because ensuring cache invalidation happens correctly and promptly across all cache nodes in a distributed environment is notoriously difficult.


Total Cards: 23
Priority: HIGH
Last Updated: 2026-05-30