Chapter 18: Microservices Architecture
fsa architecture-styles microservices-architecture
Status: Notes complete
Overview
Microservices architecture is a distributed style where a system is composed of many small, independently deployable services, each owning a bounded business domain end-to-end (its logic, its data, and its interface). Emerging from the early 2010s at companies like Netflix, Amazon, and SoundCloud as a direct reaction to SOA’s failures, microservices rejected centralized orchestration and shared data in favor of radical decentralization: each service is autonomous, each team is autonomous, and coupling between services is minimal and explicit. Chapter 18 in the 2nd edition is significantly expanded, covering the full breadth of microservices design including granularity, communication patterns, choreography vs. orchestration, the SAGA pattern, sidecar/service mesh, micro-frontends, and the BFF pattern.
Topology
+------------------+ +------------------+ +------------------+
| Mobile Client | | Web Browser | | Partner/3rd Party|
+--------+---------+ +--------+---------+ +--------+---------+
| | |
| +-----------+ |
| | |
+--------v----------v-----------------------------------v--------+
| API Gateway |
| (auth, rate limiting, routing, aggregation) |
+--+------------------+------------------+--------------------+--+
| | | |
+--v------+ +-----v---+ +--------v-+ +-----------v--+
| Order | | Catalog | | Payment | | Notification |
| Service | | Service | | Service | | Service |
| | | | | | | |
| [DB] | | [DB] | | [DB] | | [DB] |
+----+----+ +---------+ +-----+----+ +--------------+
| |
| +---event bus---+ |
+--------->| (async) |<----+
+-------+-------+
|
+---------v--------+
| Shipping Service|
| |
| [DB] |
+------------------+
A microservices system typically comprises 20-100+ independently deployable services in a large production system. Each service:
- Owns exactly one bounded domain context
- Has its own dedicated database (different technologies possible)
- Communicates with other services exclusively over the network (no in-process calls, no shared databases)
- Is deployed, scaled, and monitored independently
- Is owned by one small team
Style Specifics
Bounded Context
The bounded context is the foundational design concept from Domain-Driven Design (DDD) that microservices are built around. A bounded context is a self-contained semantic domain within which a particular model of the business is consistent and complete.
In microservices terms:
- Each service maps to exactly one bounded context
- The service owns its domain model, its business logic, and its data
- A term like “Customer” may mean different things in the Order service (a buyer), the Billing service (a legal entity), and the Support service (a contact) — and each service defines its own local model without needing consensus with the others
- The boundary of the context is enforced by the service interface: external consumers see only what the service exposes, never its internal model
Why bounded context matters: Without clear domain boundaries, services couple to each other’s internal models, defeating the purpose of separation. Bounded contexts give each service a stable identity and a clear scope for what it should and should not own.
The key discipline: when drawing service boundaries, identify where domain language changes, where data ownership changes, and where different business rules apply — these are bounded context boundaries.
Granularity
Service granularity is one of the most consequential and difficult decisions in microservices design. The right size is: as small as possible while still being independently deployable and owning a complete, coherent domain capability.
The granularity dissonance: architects naturally want small, focused services, but too small crosses into a failure mode.
Too fine-grained (distributed monolith):
- Services have no autonomous business capability — they only make sense when used together
- All calls are synchronous across service boundaries (service A calls B calls C calls D)
- The system has the operational complexity of microservices but the coupling of a monolith
- End-to-end latency is high: every user action requires a chain of synchronous network calls
- A failure in any service in the chain fails the whole operation
- Cannot deploy any service in isolation because they’re all behaviorally dependent
Too coarse-grained (mini-monolith):
- Services have grown to encompass multiple distinct bounded contexts
- Teams can no longer own the service independently — it spans multiple domains
- The service becomes a deployment bottleneck for multiple teams
- Scalability becomes uneven: one domain within the service needs 10x scale but must scale the whole service
Practical granularity guidance:
- A service should be deployable, scalable, and testable in isolation
- A service should map to one team’s cognitive ownership capacity
- If two “services” always deploy together, they are probably one service
- If one service requires changes from multiple domain teams for every feature, it is probably too large
- Target: one service per bounded context, one team per service (or 2-3 services per team for small services)
Data Isolation
Data isolation is the defining data principle of microservices: each service owns its own database, and no other service may directly access that database.
Service A Service B Service C
| | |
+--v---+ +-v----+ +-v----+
|DB A | |DB B | |DB C |
|(Postgres) |(Mongo)| |(MySQL)|
+------+ +------+ +------+
FORBIDDEN:
Service A ----SQL JOIN----> DB B (never)
Service A ----direct read-> DB B (never)
Why data isolation is non-negotiable:
- Independent deployability: If Service A can read Service B’s tables directly, a schema change in Service B breaks Service A — you’ve re-introduced compile-time coupling in the data layer
- Bounded context integrity: Data isolation enforces that each service is the sole authority for its domain data; no external service can corrupt it
- Technology freedom: Each service can choose the database technology best suited to its domain — relational, document, graph, time-series, search index
Consequences of data isolation:
- No cross-service SQL joins — if the UI needs data from multiple services, the API layer assembles it via API calls
- Eventual consistency — data in different services that is logically related will be consistent only eventually, not immediately; this is an intentional trade-off
- Duplicate data is acceptable and often preferable to coupling — a service may maintain a local projection of another service’s data (updated via events) rather than calling the other service synchronously
API Layer
API Gateway
The API Gateway is the single entry point for all external clients (browsers, mobile apps, third-party partners). It handles:
- Authentication and authorization: Validates tokens, enforces access control before requests reach services
- Rate limiting and throttling: Protects services from traffic spikes and abusive clients
- Request routing: Routes requests to the appropriate downstream service(s)
- Request aggregation: Combines responses from multiple services into a single client response (avoiding chatty client-to-service communication)
- Protocol translation: External clients use REST/HTTP; internal services may use gRPC
- SSL termination: Handles TLS at the gateway; internal communication may use mTLS via service mesh
Examples: AWS API Gateway, Kong, Apigee, Nginx, Envoy. The key principle: the API Gateway is a dumb pipe for routing and policy enforcement — unlike SOA’s ESB, it must never contain business logic.
Backend for Frontend (BFF) Pattern
A Backend for Frontend (BFF) is a variant of the API Gateway pattern where a separate, purpose-built API gateway is created for each distinct client type rather than a single shared gateway.
+------------+ +------------+ +---------------+
| Mobile App | | Web App | | Partner API |
+-----+------+ +-----+------+ +------+--------+
| | |
+-----v------+ +-----v------+ +------v--------+
| Mobile | | Web BFF | | Partner BFF |
| BFF | | | | |
+-----+------+ +-----+------+ +------+--------+
| | |
+--------+--------+------------------+
|
[Microservices]
Why BFF: mobile clients need small, battery-efficient payloads with fewer fields; web clients need rich, aggregated views; partner APIs need stable, versioned interfaces. A single gateway trying to serve all three makes trade-offs that satisfy none. With BFF, each client type gets an API shaped for its exact needs. The BFF is owned by the same team as the client it serves.
Trade-offs: more API gateway surface area to maintain, code duplication across BFFs for common concerns (auth, logging).
Operational Reuse
Sidecar Pattern
In microservices, each service needs operational capabilities: logging, distributed tracing, service discovery, health checks, circuit breaking, and mutual TLS. Duplicating this implementation in every service (in potentially different languages) is expensive and inconsistent.
The sidecar pattern solves this by deploying a separate, lightweight process (the sidecar) alongside each service instance. The sidecar handles operational concerns on behalf of the service:
+---------------------------+
| Pod / VM |
| +--------+ +---------+ |
| | | | | |
| |Service | | Sidecar | |
| | |<-> | |
| +--------+ +---------+ |
| | |
+-------------------+-------+
|
[Control Plane /
Service Mesh]
The sidecar intercepts all inbound and outbound network traffic for the service. The service code itself is unaware of the operational concerns — it simply makes network calls and the sidecar handles mTLS, retries, tracing, and metrics collection.
Service Mesh
A service mesh is a dedicated infrastructure layer — typically implemented as a set of sidecar proxies (one per service instance) plus a control plane — that handles all service-to-service communication concerns:
- mTLS (mutual TLS): Automatically encrypts and authenticates all service-to-service traffic without application code changes
- Load balancing: Intelligent traffic distribution across service instances
- Circuit breaking: Automatically stops sending traffic to unhealthy service instances
- Distributed tracing: Generates and propagates trace IDs across service calls
- Traffic management: Canary deployments, A/B testing, traffic splitting by percentage
- Observability: Automatic metrics collection (latency, error rate, throughput) for every service pair
Examples: Istio (most feature-complete, complex), Linkerd (lightweight, simpler), Consul Connect, AWS App Mesh.
The service mesh replaces much of what the SOA ESB did for cross-cutting concerns — but as infrastructure rather than application middleware, and without the central bottleneck or business logic risk.
Frontends
Micro-Frontends
Just as microservices decompose the backend by domain, micro-frontends apply the same principle to the frontend: each service team owns the end-to-end slice of their domain including the UI.
+--------------------------------------------------------+
| Web Application |
| +------------------+ +------------------+ |
| | Order UI | | Catalog UI | |
| | (Team Orders) | | (Team Catalog) | ... |
| +------------------+ +------------------+ |
| Shell Application (routing, nav, auth shell) |
+--------------------------------------------------------+
Each team ships their domain’s UI as an independently deployable frontend fragment. A shell application provides navigation, authentication, and composition. Teams use technologies like Web Components, Module Federation (Webpack 5), or iframe-based composition.
Challenges: UI consistency (different teams’ components may look/behave inconsistently), shared state management, authentication, performance (loading multiple JavaScript bundles), and testing the composed application end-to-end.
Not universally adopted — many microservices teams maintain a shared frontend application with clear domain ownership boundaries within it, rather than full micro-frontend decomposition.
Communication
Synchronous Communication
Services call each other directly and wait for a response.
- REST (HTTP/JSON): Most common. Human-readable, browser-compatible, easy to debug. Tooling support is excellent. Drawback: verbose payloads, no formal schema enforcement without OpenAPI.
- gRPC (HTTP/2 + Protocol Buffers): Binary protocol, strongly typed schemas, much lower latency and payload size than REST. Excellent for service-to-service communication in internal networks. Drawback: not browser-native, less human-readable, harder to debug without tooling.
When to use synchronous: when the client needs an immediate result, when the operation is a query (read), when the caller must handle the response before proceeding.
Risk: synchronous calls create temporal coupling — caller and callee must both be available simultaneously. A chain of synchronous calls (A→B→C→D) means a failure anywhere in the chain fails the entire operation and latency compounds additively.
Asynchronous Communication
Services communicate via messages or events, with no expectation of immediate response.
- Message queues (RabbitMQ, AWS SQS, Azure Service Bus): Point-to-point; one sender, one consumer. Used for work distribution and reliable task execution.
- Event streaming (Apache Kafka, AWS Kinesis, Google Pub/Sub): Publish-subscribe; one publisher, many consumers. Events are persisted in a log; consumers read at their own pace. Used for event-driven architectures, CQRS, audit trails.
When to use asynchronous: fire-and-forget operations, long-running processes, when decoupling producer and consumer availability is important, when multiple consumers need the same event (fan-out), when you need event replay or audit.
Trade-off: asynchronous communication is harder to reason about, introduces eventual consistency, and requires the system to handle message ordering, duplicate delivery, and failure in consumers.
Choreography and Orchestration
When a business process spans multiple services, two coordination styles exist:
Choreography (decentralized, event-driven):
- Services react to events independently with no central controller
- Each service publishes events when it completes its work; other services subscribe and react
- No single service knows the full business process — each only knows its own role and what events to emit
Order Service --[OrderPlaced]--> Kafka
|
+-----------+---------+-----------+
v v v
Payment Svc Inventory Svc Notification Svc
(reserves) (reserves stock) (sends email)
|
[PaymentProcessed] --> Kafka
|
Shipping Svc
(creates shipment)
Pros: high decoupling, services don’t know about each other, easy to add new consumers
Cons: hard to understand the overall process (no single view), difficult to debug/trace, hard to add compensating logic for partial failures
Orchestration (centralized, process-driven):
- A central orchestrator service (or saga orchestrator) controls the workflow, calling each participant service in sequence and handling failures
[Saga Orchestrator]
|
+---> call Order Service
+---> call Payment Service (on Order success)
+---> call Inventory Service (on Payment success)
+---> call Shipping Service (on Inventory success)
+---> call Notification Service (on Shipping success)
+---> handle compensation if any step fails
Pros: the process flow is visible in one place, failure handling and compensating transactions are explicit, easy to add process-level monitoring
Cons: the orchestrator becomes a coupling point (every service knows about the orchestrator), risk of recreating an ESB-like central controller
When to use which:
- Use choreography when the interaction is simple (2-3 services), adding new consumers is likely, and the domain naturally models as reactive events
- Use orchestration when the process is long-running, has complex failure and compensation logic, spans many services in sequence, or requires explicit visibility into the business process state
Transactions and Sagas
Why Distributed Transactions Fail
ACID transactions require a single transaction coordinator that can lock resources across all participants until the transaction completes or rolls back. In a distributed system with services owning separate databases, this requires the 2-Phase Commit (2PC) protocol — which has severe problems:
- The transaction coordinator is a SPOF and a bottleneck
- All participants are locked (blocking) until the coordinator decides
- If the coordinator fails mid-transaction, participants are left in an uncertain state
- 2PC defeats the independent scalability of microservices
The conclusion: distributed transactions across microservices are not practical and should not be attempted for most use cases.
The SAGA Pattern
A SAGA is an alternative to distributed transactions: a sequence of local transactions (one per service) where each step either succeeds or triggers a compensating transaction to undo previous steps.
Step 1: Order Service creates order (local TX) --> [OrderCreated event]
Step 2: Payment Service charges card (local TX) --> [PaymentProcessed event]
Step 3: Inventory Service reserves stock (local TX) --> [StockReserved event]
Step 4: Shipping Service creates shipment (local TX) --> [ShipmentCreated event]
If Step 3 fails (out of stock):
Compensate Step 2: Payment Service refunds card
Compensate Step 1: Order Service cancels order
Each local transaction commits immediately (not locked). If a later step fails, compensating transactions reverse prior steps — these are NOT rollbacks (data already committed and may have been visible), they are new forward-moving operations that undo the business effect.
Orchestrated Saga:
- A dedicated Saga Orchestrator service manages the saga workflow
- The orchestrator calls each participant service, handles responses, and triggers compensation on failure
- State of the saga is maintained in the orchestrator’s own database
- Pros: explicit process state, easy to visualize and monitor, compensation logic is centralized
- Cons: orchestrator is a coupling point; risk of becoming a mini-ESB if it acquires business logic
Choreographed Saga:
- No central orchestrator; each service listens for events and publishes events
- Compensation is triggered by publishing failure events that other services react to
- Pros: fully decoupled, no central coordinator
- Cons: the saga flow is implicit and distributed across services — very hard to track, debug, or reason about failure scenarios; compensating event chains are difficult to get right
When to use which:
- Orchestrated saga: default choice for complex multi-step business processes — the visibility and explicit compensation logic are worth the coupling overhead
- Choreographed saga: simple 2-3 step processes, or when the team has strong event-driven culture and tooling (distributed tracing, event sourcing)
Data Topologies
Single Service, Single Database (Ideal)
+------------+
| Service A |
+-----+------+
|
+-----v------+
| DB A |
| (Postgres) |
+------------+
The ideal: one service, one database. Complete isolation. No shared schema. Service A is the sole owner and writer of DB A. This is the canonical microservices data topology.
Domain Cluster
+------------+ +------------+
| Service A | | Service B |
| (subdom 1) | | (subdom 2) |
+-----+------+ +-----+------+
| |
+---+ +----------+
| |
+----v----v----+
| Shared DB |
| Schema A |
| Schema B |
+--------------+
Multiple closely related services share the same database server but with strict schema separation. Acceptable when services are in the same domain cluster, schemas are owned exclusively by their service, and no cross-schema writes occur. Common as a stepping stone toward full isolation when database-per-service operational overhead is not yet justified.
Read Model / CQRS
Write path: Read path:
+----------+ [event] +-------+ +----------------+
| Command |-------------> | Event | | Read Model DB |
| Service | | Bus |--> | (denormalized) |
+----------+ +-------+ +-------+--------+
|
+-------v--------+
| Query Service |
+----------------+
CQRS (Command Query Responsibility Segregation): The write model (commands) and read model (queries) are separated. Write services publish events; a dedicated read model service consumes events and maintains a denormalized view optimized for read queries. This enables cross-service data aggregation without cross-service joins on write databases. The read model is eventually consistent with the write model.
Shared Sidecars for Operational Data
Operational data (metrics, traces, logs) is typically centralized via the sidecar pattern and service mesh, not split per-service. All services emit to shared observability infrastructure (Prometheus, Jaeger, Elasticsearch) — this is not a violation of data isolation because operational data has no domain semantic coupling.
Cloud Considerations
Microservices and cloud-native infrastructure were co-developed and are deeply complementary:
Kubernetes: The de facto deployment platform for microservices. Each service runs as a Kubernetes Deployment with its own Pod spec. Services discover each other via Kubernetes DNS. Horizontal scaling is per-service (kubectl scale deployment order-service --replicas=10). Rolling deployments enable zero-downtime service updates. Kubernetes abstracts the underlying infrastructure so services don’t care which physical node they run on.
Service Mesh on Kubernetes: Istio or Linkerd deploys sidecar proxies as Kubernetes init containers, transparently intercepting all pod-to-pod traffic. The control plane (Istio’s Istiod) manages configuration, certificate rotation, and traffic policy without application code changes.
Managed Databases: Each service uses its own managed database service — Amazon RDS, DynamoDB, Google Firestore, Azure Cosmos DB. This provides per-service data isolation with cloud provider management of backups, failover, and patching. Different services can use different database technologies.
Managed Messaging: AWS SQS/SNS, Google Pub/Sub, or Azure Service Bus for asynchronous communication. Kafka on Confluent Cloud for event streaming. These managed services eliminate the operational burden of self-hosting message infrastructure.
Container Registry and CI/CD: Each service has its own build pipeline (GitHub Actions, GitLab CI, ArgoCD). The service can be built, tested, containerized, and deployed completely independently. This is the operationalization of independent deployability.
Distributed Tracing: Jaeger, Zipkin, or AWS X-Ray for tracing requests across service boundaries. Essential for debugging latency and failures in a distributed system — without tracing, a 500ms request spanning 6 services has no visible call chain.
Common Risks
Network Brittleness: Every inter-service call is a network call subject to all 8 fallacies of distributed computing. Synchronous call chains amplify failure probability: if each service has 99.9% availability, a chain of 10 services has only 99.0% end-to-end availability. The entire system must be designed defensively: retries, circuit breakers, timeouts, graceful degradation.
Operational Complexity: Operating 50+ independently deployed services, each with its own database, monitoring dashboards, alerting rules, scaling configuration, and deployment pipeline, is enormously expensive. Teams need strong DevOps/SRE capability. Without this, microservices become an operational nightmare — a “distributed monolith” that has all the costs of microservices and none of the benefits.
Distributed Tracing and Debugging: A request that spans 10 services produces log entries in 10 different places. Without correlated trace IDs (propagated via service mesh or instrumented SDK) and a centralized trace aggregation system (Jaeger, Zipkin), debugging production failures becomes extremely difficult.
Data Consistency Challenges: Eventual consistency between services is a fundamental trade-off. Business workflows that require synchronous consistency (e.g., “charge the card only after confirming inventory availability”) must be designed explicitly with sagas, not assumed. Teams that underestimate this produce systems with subtle inconsistency bugs that are hard to diagnose.
Service Discovery and Dependency Management: With 50+ services, understanding which services depend on which others, what the current API contracts are, and what the health of each service is requires robust tooling (service registry, API catalog, dependency mapping). Without this, the system’s overall structure becomes opaque.
Premature Microservices: Applying microservices to a small team or early-stage product creates operational overhead that destroys velocity. The Martin Fowler “microservices premium” — the additional complexity cost of microservices over a monolith — only pays off when the team and system are large enough to benefit from independent deployability.
Governance
Contract Testing: Services must not break their consumers’ expectations silently. Consumer-Driven Contract Testing (CDCT) with tools like Pact allows consumers to define their expectations of a service’s API, and the service runs these consumer-defined contracts as part of its own test suite. A service cannot be deployed if it breaks any of its consumers’ contracts.
API Versioning: Services must evolve their APIs without breaking existing consumers. Strategies: URL versioning (/v1/orders, /v2/orders), header versioning, or Tolerant Reader pattern (consumers ignore unknown fields). The rule: never remove a field from a response without a versioned deprecation period.
Schema Registry: For event-driven communication, a schema registry (Confluent Schema Registry, AWS Glue Schema Registry) enforces that event schemas are versioned and backward/forward compatible. Producers must register schemas; consumers validate against them.
Fitness Functions: Automated tests that enforce architectural rules — no service may directly access another service’s database, all inter-service calls must have timeouts configured, no synchronous call chains longer than 3 hops, etc.
Service Catalog / API Portal: A developer portal (Backstage, SwaggerHub) that provides a searchable catalog of all services, their owners, their API contracts, their SLAs, and their dependency graph. Essential for discoverability in large organizations.
Team Topology
Microservices architecture is defined by Conway’s Law operating in its correct, intended direction: one team per service (or per small cluster of closely related services), where each team is cross-functional (backend, frontend, QA, data) and owns their service end-to-end — design, development, testing, deployment, and on-call.
+-----------------+ +-----------------+ +-----------------+
| Orders Team | | Catalog Team | | Payment Team |
| Dev, QA, Ops | | Dev, QA, Ops | | Dev, QA, Ops |
| owns: | | owns: | | owns: |
| - Order Svc | | - Catalog Svc | | - Payment Svc |
| - Orders DB | | - Catalog DB | | - Payment DB |
| - Order API | | - Catalog API | | - Payment API |
+-----------------+ +-----------------+ +-----------------+
| | |
+--------------------+---------------------+
|
+---------------------------------+
| Platform Team (SRE / DevOps) |
| owns: Kubernetes, service mesh,|
| CI/CD platform, observability |
+---------------------------------+
The Platform Team provides the infrastructure-as-a-product that stream-aligned service teams consume. They do not own any business service — they enable service teams to operate autonomously.
Team cognitive load: each team’s scope is bounded to their service(s). They understand their domain deeply without needing to understand the full system. This is microservices’ primary team-topology benefit.
“You build it, you run it” (Werner Vogels, Amazon): the team that builds the service is also responsible for operating it in production. This creates a strong incentive to make services observable, resilient, and easy to operate. Teams that feel on-call pain design better systems.
Architectural Characteristics Ratings
| Characteristic | Rating | Notes |
|---|---|---|
| Overall agility | ★★★★★ | Independent deployment per service; teams can ship at their own cadence without coordinating with others |
| Ease of deployment | ★★★★☆ | Each service deploys independently; but managing 50+ deployment pipelines requires strong DevOps tooling |
| Testability | ★★★★☆ | Services are testable in isolation; contract testing validates integration; but end-to-end tests are complex |
| Performance | ★★★☆☆ | Network overhead on every inter-service call; synchronous call chains compound latency; serialization cost |
| Scalability | ★★★★★ | Each service scales independently based on its own load; the defining scalability advantage of the style |
| Ease of development | ★★☆☆☆ | Individual services are simple; but the distributed system infrastructure (tracing, service mesh, contracts) is complex |
| Simplicity | ★☆☆☆☆ | The most operationally complex architecture style; high accidental complexity from distributed systems concerns |
| Overall cost | ★★☆☆☆ | High infrastructure cost (per-service DB, many clusters); high engineering cost for DevOps/SRE; justified only at scale |
When to Use
- Large systems with many distinct bounded domains that need to evolve at different velocities
- Organizations with 50+ engineers where independent team deployment cadence is critical to business agility
- Systems with highly variable and domain-specific scalability requirements (some services need 100x the capacity of others)
- Organizations that have (or can build) strong DevOps/SRE capability to manage distributed infrastructure
- Systems where parts have very different technology requirements (one service needs a graph DB, another needs time-series)
- Situations where fault isolation is critical — a failure in one service must not cascade to bring down the whole system
- Systems that began as a modular monolith and have outgrown single-unit deployment
When Not to Use
- Small teams (fewer than ~20-30 engineers) without dedicated DevOps/platform capability — the operational overhead will consume all engineering capacity
- Early-stage products where the domain model is not yet stable — service boundaries drawn before the domain is understood will be wrong, and refactoring microservice boundaries is expensive
- Systems with uniform scalability requirements — if every part of the system scales at the same rate, the scalability benefit is irrelevant
- Systems with strict data consistency requirements that cannot tolerate eventual consistency — distributed sagas add enormous complexity; a monolith with ACID transactions may be more appropriate
- Budget-constrained projects where the infrastructure cost of per-service databases, Kubernetes clusters, and managed messaging is prohibitive
Examples and Use Cases
Netflix: The canonical microservices reference. Decomposed from a DVD-rental monolith into hundreds of services (video encoding, recommendation, streaming, user management, billing, etc.) after a 2008 database outage threatened the business. Each service is independently deployable and scalable; the streaming service can scale to 200 million simultaneous viewers without scaling the billing service.
Amazon: The “two-pizza team” rule (teams small enough to be fed by two pizzas) directly maps to microservices ownership. Amazon’s e-commerce platform decomposes into product catalog, inventory, pricing, order management, payment, fulfillment, shipping, and recommendation services, among many others. Teams own their service end-to-end and deploy independently multiple times per day.
Uber: The ride-sharing platform is decomposed by domain: driver matching, routing, pricing (surge), payment, driver management, notifications, maps. Services have different scalability profiles (routing scales with active rides; pricing scales with pricing requests) and are maintained by dedicated domain teams.
Key Takeaways
- Bounded context is the foundation: Each service maps to exactly one DDD bounded context — it owns its domain model, its business logic, and its data. This is the architectural primitive from which everything else follows.
- Data isolation is non-negotiable: Every service must have its own database. Cross-service database access (direct SQL, shared schema) re-introduces the coupling that services were designed to eliminate.
- Granularity is the hardest decision: Too fine → distributed monolith (synchronous coupling without independence). Too coarse → mini-monolith (deployment bottleneck). The right size owns a complete, independently deployable domain capability aligned to one team.
- Sidecar and service mesh replace the ESB: Cross-cutting operational concerns (mTLS, circuit breaking, tracing, load balancing) are handled by sidecar proxies and a service mesh control plane — as infrastructure, not application code.
- Choreography vs. orchestration: Choreography (event-reactive, no central controller) gives decoupling; orchestration (saga orchestrator, central process controller) gives visibility and explicit failure handling. Use orchestrated sagas for complex multi-step processes.
- SAGAs replace distributed transactions: Distributed 2PC is impractical. SAGAs use sequences of local transactions with compensating transactions on failure. Orchestrated sagas are preferred for visibility; choreographed sagas for simpler, decoupled flows.
- BFF pattern per client type: Different clients (mobile, web, partner) have different API needs. A Backend for Frontend creates a purpose-built API layer for each client type, owned by the team that builds that client.
- “You build it, you run it”: Teams own their services in production. On-call responsibility creates strong incentives for observability, resilience, and clean deployment. This is the operational philosophy that makes microservices work.
- The operational cost is real and high: Microservices score only ★★☆☆☆ for cost and ★☆☆☆☆ for simplicity. Distributed tracing, per-service databases, contract testing, and Kubernetes expertise are all prerequisites, not luxuries.
- Prefer duplication over coupling: It is better for two services to each maintain their own copy of a domain concept (updated via events) than to share a service or database. Coupling is the enemy of independent deployability.
Related Resources
- ch09-architecture-styles-foundations — fundamental distributed architecture patterns and 8 fallacies
- ch17-orchestration-driven-soa — the historical predecessor microservices corrected
- ch19-choosing-architecture-style — style selection guide
Last Updated: 2026-05-29