Chapter 4: Architectural Decomposition
saht decomposition coupling instability abstractness component-decomposition tactical-forking
Status: Notes complete
Overview
Chapter 4 moves from why to decompose (Chapter 3’s modularity drivers) to how to decompose. It begins with a critical prerequisite question: is the codebase even decomposable? Not every codebase can be cleanly separated — some are so entangled at the code level that decomposition requires a fundamentally different approach than others. The chapter provides tools from software metrics (afferent/efferent coupling, abstractness, instability, distance from the main sequence) to diagnose decomposability.
The chapter then presents and compares two primary decomposition strategies: component-based decomposition (a careful, incremental approach that identifies component boundaries from existing code structure) and tactical forking (a “clone and prune” approach that starts from a copy of the whole system). Each strategy suits a different type of codebase and organizational context, and the authors provide explicit criteria for choosing between them.
The chapter’s central argument is that decomposition is not a free design activity — it has a cost proportional to the degree of existing coupling in the codebase. Before choosing a strategy, architects must measure coupling to understand what they are working with.
Core Concepts
Afferent coupling (Ca): The number of external components that depend on a given component. High afferent coupling means many things depend on this component — it is “widely used” and changes to it have broad impact. Also called “incoming coupling.”
Efferent coupling (Ce): The number of external components that a given component depends on. High efferent coupling means this component depends on many things — it is fragile and brittle because its behavior is contingent on many other components. Also called “outgoing coupling.”
Abstractness (A): The ratio of abstract classes and interfaces to total classes in a component. Ranges 0.0 to 1.0. A=0 means all concrete implementations; A=1 means all abstract definitions.
Instability (I): A derived metric: Ce / (Ca + Ce). Ranges 0.0 to 1.0. I=0 is maximally stable (nothing depends on, everything depends on this); I=1 is maximally unstable (depends on many things, nothing depends on this). Measures resistance to change.
Distance from the main sequence (D): A derived metric: |A + I - 1|. Ranges 0.0 to 1.0. D=0 means the component is on the ideal “main sequence” (balanced abstractness and instability). D approaching 1 means the component is in a “zone of pain” or “zone of uselessness.”
Zone of pain: High stability (low I) + low abstractness (low A) — concrete classes that many things depend on. Difficult to change because changes ripple everywhere, yet difficult to extend because nothing is abstract.
Zone of uselessness: High abstractness (high A) + high instability (high I) — abstract classes/interfaces that nothing actually uses. Dead weight in the codebase.
Main sequence: The ideal diagonal line from (A=1, I=0) to (A=0, I=1) — abstract components are stable (widely depended on), concrete components are unstable (free to change because fewer things depend on them).
Component-based decomposition: A systematic, incremental strategy that uses existing code structure (packages, namespaces, modules) as the basis for identifying service boundaries, then carefully moves components across those boundaries one at a time.
Tactical forking: A “clone and prune” strategy that starts by duplicating the entire monolith for each intended service, then progressively deletes the code that doesn’t belong in each copy until distinct services emerge.
Is the Codebase Decomposable? Coupling Analysis
Before choosing a decomposition strategy, architects must assess how entangled the codebase actually is. This assessment uses the software metrics framework from Robert C. Martin’s work on package coupling principles.
Afferent and Efferent Coupling
Afferent Coupling (Ca)
"What depends on me?"
^
|
[Component X] ---------> Dependencies it uses
(Efferent Coupling, Ce)
"What do I depend on?"
Measuring Ca and Ce gives a picture of each component’s role:
- High Ca, Low Ce: A foundation/utility component. Many things depend on it; it depends on little. Stable but risky to change (wide blast radius).
- Low Ca, High Ce: A leaf/feature component. Few things depend on it; it depends on many. Easy to change in isolation, but fragile because it can be broken by changes in its dependencies.
- High Ca, High Ce: A hub component — the most dangerous kind. Many things depend on it AND it depends on many things. Changes here are both risky and difficult. These are the components that make decomposition hard.
- Low Ca, Low Ce: An isolated component. Easy to decompose — it’s nearly already independent.
The Instability Metric (I)
I = Ce / (Ca + Ce)
where:
Ce = efferent (outgoing) coupling
Ca = afferent (incoming) coupling
Range: 0.0 (maximally stable) to 1.0 (maximally unstable)
Interpretation:
- I ≈ 0: Very stable. Many things depend on it; it depends on few things. Changes propagate outward to many dependents. This component should be abstract (to allow extension without modification).
- I ≈ 1: Very unstable. Few things depend on it; it depends on many things. Changes don’t propagate to many dependents. This component can afford to be concrete.
The stability principle (from Martin): Depend in the direction of stability. Components that are concrete (low A) should be unstable (high I) — they are free to change. Components that are stable (low I) must be abstract (high A) — their stability is achieved through abstraction, allowing extension without modification.
The Abstractness Metric (A)
A = (Number of abstract classes + interfaces) / (Total number of classes)
Range: 0.0 (fully concrete) to 1.0 (fully abstract)
Abstractness measures the degree to which a component relies on abstraction vs. implementation. High abstractness means behavior is primarily defined through interfaces/abstract classes that can be extended; low abstractness means behavior is in concrete classes that cannot be extended without modification.
Distance from the Main Sequence (D)
D = |A + I - 1|
Range: 0.0 (on the main sequence) to ~1.41 (maximally distant)
Normalized: D' = |A + I - 1| / sqrt(2) (range 0.0 to 1.0)
The main sequence is the ideal relationship between abstractness and instability. Components on the main sequence are either:
- Abstract and stable (frameworks, APIs, interfaces that many things depend on)
- Concrete and unstable (implementations that few things depend on, free to change)
The Instability/Abstractness Graph
Abstractness (A)
1.0 |
| Zone of
| Uselessness
| *
0.7 | *
| * [Main Sequence]
0.5 | *
| *
0.3 | *
| *
| * Zone of
0.0 |_____________________*___Pain
0.0 0.3 0.5 0.7 1.0
Instability (I)
Zone of Pain (I≈0, A≈0): Concrete + stable. Hard to change, hard to extend.
Example: utility classes everything depends on.
Zone of Useless (I≈1, A≈1): Abstract + unstable. Abstract but nobody uses them.
Example: abandoned framework interfaces.
Main Sequence (D≈0): Ideal. Abstract things are stable; concrete things are free.
What D Tells You About Decomposability
| D Score | Meaning | Decomposition Implication |
|---|---|---|
| 0.0 – 0.2 | On or near main sequence | Clean component, good candidate for extraction |
| 0.2 – 0.5 | Some distance from ideal | May be extractable with refactoring |
| 0.5 – 0.7 | Significant structural issues | Decomposition will be painful; consider tactical forking |
| 0.7 – 1.0 | Zone of pain or uselessness | Very hard to decompose incrementally |
A codebase where most components have D > 0.5 is a strong indicator that tactical forking may be more practical than incremental component-based decomposition — there’s no clean structure to work with.
Decomposition Approach 1: Component-Based Decomposition
What It Is
Component-based decomposition is an incremental, structure-preserving approach. It uses the existing package/namespace/module hierarchy as the starting point for identifying service candidates, then systematically moves cohesive groups of classes across service boundaries. The underlying assumption is that the codebase has some recognizable structure that maps to business domains, even if imperfectly.
This is the approach detailed extensively in Chapter 5 (Component-Based Decomposition Patterns). Chapter 4 introduces it as one of two strategic options.
How It Works (High Level)
Step 1: Identify existing components
(packages, namespaces, modules in the monolith)
|
v
Step 2: Measure coupling between components
(Ca, Ce, D scores per component)
|
v
Step 3: Find clusters of high cohesion / low coupling
(components that mainly talk to each other)
|
v
Step 4: Define service boundaries around clusters
|
v
Step 5: Extract one service at a time, incrementally
(strangler fig pattern)
|
v
Step 6: Verify independence before extracting the next
Advantages of Component-Based Decomposition
- Preserves existing structure: Reuses the work already done in organizing the codebase. Developers are familiar with the structure.
- Incremental risk: Each extraction step is independently testable and reversible. Mistakes affect one service, not the whole system.
- Clear progress tracking: You can measure coupling metrics before and after each step to confirm improvement.
- Naturally discovers domain boundaries: Following cohesion clusters often reveals domain structure that matches business ownership.
- Lower wasted effort: Only the code actually needed for each service is written/tested; nothing is written twice.
Disadvantages of Component-Based Decomposition
- Requires existing structure: If the codebase is a Big Ball of Mud (high coupling everywhere, no recognizable structure), there are no clusters to follow. The prerequisite simply isn’t there.
- Slower: Incremental extraction takes more calendar time than forking. Teams must maintain and evolve the monolith while simultaneously extracting services.
- Requires significant architectural knowledge: Someone must deeply understand the codebase to identify the correct boundaries. This is often lost knowledge in aging codebases.
- Risk of creating distributed monolith: Without careful coupling analysis, “extracted” services may still depend tightly on the remaining monolith — producing all the costs of distribution without the independence.
When to Use Component-Based Decomposition
- The codebase has recognizable structure (packages/namespaces align with domains)
- D scores are mostly below 0.5 (components are reasonably well-placed)
- The team has good knowledge of the existing codebase
- Time is available for careful, incremental work
- Business continuity requires maintaining the monolith during migration
Decomposition Approach 2: Tactical Forking
What It Is
Tactical forking (sometimes called “clone and prune” or “strangler by subtraction”) starts from the opposite end. Instead of building up services from parts of the monolith, it duplicates the entire monolith for each intended service and then deletes everything that doesn’t belong in that service. The result is that each service starts as a full copy of the system and is progressively pruned until only the relevant code remains.
Monolith (full)
|
+-------> Copy 1 (intended: Order Service)
| Delete: User, Inventory, Reporting, ...
| Keep: Order processing logic
| Result: Order Service
|
+-------> Copy 2 (intended: User Service)
| Delete: Order, Inventory, Reporting, ...
| Keep: User management logic
| Result: User Service
|
+-------> Copy 3 (intended: Inventory Service)
Delete: Order, User, Reporting, ...
Keep: Inventory management logic
Result: Inventory Service
Why Deletion Is Easier Than Extraction
In a tightly coupled codebase, removing code that calls into a component is mechanically simpler than extracting that component while keeping the rest working:
- Deletion is additive risk: Deleting code that “belongs to another service” can be validated by running the remaining tests. If something breaks, the deleted code was needed.
- Extraction is subtractive risk: Extracting a component requires re-wiring all its callers to use the new service boundary — a complex refactoring task in a tightly coupled codebase.
This asymmetry is the key insight behind tactical forking. In a poorly structured codebase, deletion is the more tractable operation.
Advantages of Tactical Forking
- Works on any codebase: Even a Big Ball of Mud can be forked. You don’t need existing structure.
- Faster initial progress: Each team can immediately start working on their service copy without waiting for architectural analysis to complete.
- Familiar environment: Each team continues working in a copy of the codebase they already know.
- No strangler phase required: The monolith doesn’t need to be kept alive while migration happens; each fork is immediately a candidate deployment unit.
- Parallelizable: Multiple teams can work on their respective forks simultaneously.
Disadvantages of Tactical Forking
- Massive code duplication: Every fork starts as a full copy. Common infrastructure, utilities, and shared logic exist in N copies simultaneously. Changes to shared logic must be synchronized across all forks.
- Risk of forking shared state: If shared libraries and utility code are simply included in each fork, those forks are not truly independent — they share code but not a service boundary. They may end up as a distributed monolith with duplicated code rather than duplicated services.
- Technical debt multiplication: Any pre-existing technical debt in the monolith is replicated into every fork. Pruning removes code but not the quality issues in the remaining code.
- No inherent cleanup: Tactical forking doesn’t require fixing underlying coupling issues — it just hides them. The resulting services may have the same internal quality problems as the monolith.
- Test suite duplication: All tests exist in all forks. Test maintenance burden multiplies.
When to Use Tactical Forking
- The codebase is a Big Ball of Mud (D scores mostly above 0.5, coupling is pervasive)
- The team has little understanding of the existing codebase structure (high developer turnover)
- Speed of initial progress is critical (competitive pressure, funding milestones)
- The plan includes significant rewriting of the duplicated code (so duplication is temporary)
- Domain boundaries are reasonably understood even if the code doesn’t reflect them
Comparing the Two Approaches
Trade-off Table
| Dimension | Component-Based Decomposition | Tactical Forking |
|---|---|---|
| Prerequisite | Requires identifiable structure (low D) | Works on any codebase |
| Initial speed | Slower (analysis first) | Faster (start immediately) |
| Code duplication | None (code moves, not copies) | High (N full copies initially) |
| Technical debt | Cleaned up during extraction | Replicated into each fork |
| Risk profile | Lower per-step risk; incremental | Higher initial risk; big-bang |
| Domain knowledge needed | High (need to understand structure) | Lower (deletion is mechanical) |
| Distributed monolith risk | Medium (shared DB is the main risk) | High (shared code in forks) |
| Suitable for | Moderately structured codebases | Big Ball of Mud codebases |
| Resulting code quality | Generally higher | Depends on post-fork investment |
| Parallelizability | Lower (sequential extraction) | High (teams work independently) |
Decision Tree
Is the codebase decomposable?
(Run coupling analysis: Ca, Ce, D scores)
|
v
Are D scores mostly < 0.5?
YES NO
| |
v v
Do we have time for Use TACTICAL FORKING
incremental migration? Plan for post-fork
YES NO rewrite/cleanup
| |
v v
COMPONENT-BASED Consider
DECOMPOSITION TACTICAL
FORKING
The Role of Coupling Metrics in Practice
Using Metrics to Guide Extraction Order
When using component-based decomposition, coupling metrics determine the sequence of extractions:
- Extract low-Ce components first: Components with few dependencies can be extracted with minimal disruption to the remaining codebase.
- Extract high-D components cautiously: Components far from the main sequence will require refactoring before or during extraction.
- Defer high-Ca components: Components that many others depend on should be extracted last (or become shared libraries) because extracting them early creates a ripple of re-wiring.
Tools for Measuring These Metrics
- JDepend (Java): Computes Ca, Ce, A, I, D per package
- NDepend (.NET): Similar metrics for .NET assemblies
- ArchUnit: Enforces coupling rules in test suites
- Lattix: Dependency structure matrix visualization
- Custom scripts parsing import statements in Python/Go/JavaScript
Coupling Clusters as Service Boundaries
The authors recommend building a dependency structure matrix (DSM) or coupling graph to visualize which components cluster together naturally:
Component Matrix (X = dependency):
Auth Order Inv Report User Notify
Auth - X - - X -
Order X - X - X X
Inv - X - X - -
Report - X X - - -
User X X - - - X
Notify - - - - X -
Clusters visible: {Auth, User, Notify} and {Order, Inv, Report}
Components with dense internal connections and sparse external connections are natural service candidates.
Sysops Squad Saga
Context: After Penelope makes the business case (Chapter 3), the team must choose a decomposition approach for the Sysops Squad monolith — 800K lines, 47,000+ classes, built over 15 years.
Analysis the team performs:
- They run coupling analysis on the major packages:
ss.ticket,ss.expert,ss.notification,ss.reporting,ss.billing,ss.user - They find that
ss.tickethas high afferent coupling (many things call into it) and is concrete (low A) — it sits in the zone of pain with D ≈ 0.65 ss.notificationhas low Ca and low Ce — D ≈ 0.1, very close to the main sequence, and already nearly independentss.billinghas complex coupling to external payment providers AND to internal ticket state — mixed results
Decision: The team chooses component-based decomposition because:
- Several packages (notification, user management, reporting) have identifiable structure and D scores below 0.4
- The team has a core set of long-tenure developers who understand the codebase architecture
- Business continuity requires the monolith to keep running during migration
Extraction order decided:
ss.notificationfirst — lowest D, nearly independentss.reportingsecond — read-only dependencies, can be separated cleanlyss.userthird — moderate coupling, clear domain boundaryss.expertfourth — coupled to ticket but separabless.ticketlast — highest Ca, most other services depend on it; must remain until others are migrated
What the saga demonstrates: Coupling metrics directly inform the extraction order. You don’t extract the most important service first — you extract the most independent service first to reduce risk and build team confidence.
Key Takeaways
-
Before choosing a decomposition approach, architects must diagnose codebase decomposability using coupling metrics — afferent coupling (Ca), efferent coupling (Ce), abstractness (A), instability (I), and distance from main sequence (D).
-
Instability (I = Ce / (Ca + Ce)) measures a component’s resistance to change: I≈0 is stable (many depend on it); I≈1 is unstable (it depends on many things). Components should be depended on in the direction of increasing stability.
-
The zone of pain (concrete + stable, high Ca + low A) is the most problematic region: components there are hard to change AND hard to extend. They are the primary obstacles to clean decomposition.
-
The zone of uselessness (abstract + unstable, high A + high I) contains dead weight — abstract structure that nothing actually uses. These should be deleted or collapsed.
-
Component-based decomposition is the preferred approach when the codebase has identifiable structure (D scores mostly below 0.5) and the team has codebase knowledge. It is incremental, lower-risk, and produces cleaner results.
-
Tactical forking (“clone and prune”) is the pragmatic choice when the codebase is a Big Ball of Mud — pervasive coupling makes incremental extraction impractical. It trades code duplication for tractability.
-
The primary risk of tactical forking is code duplication and technical debt multiplication — every quality problem in the monolith is copied into every fork. Post-fork investment in cleanup is essential.
-
Coupling metrics determine the extraction order in component-based decomposition: extract lowest-D, lowest-Ce components first; defer high-Ca components until dependent services are already extracted.
-
A dependency structure matrix (DSM) or coupling graph visualizes natural service clusters — groups of components with dense internal coupling and sparse external coupling are natural service candidates.
-
Both approaches risk creating a distributed monolith if shared state (especially the database) is not also decomposed. Application-tier decomposition must eventually be accompanied by data decomposition (covered in Ch. 6).
Related Resources
- ch03-architectural-modularity — The modularity drivers that justify decomposition
- ch05-component-decomposition-patterns — Specific patterns for executing component-based decomposition
- ch06-pulling-apart-operational-data — Data decomposition required alongside service decomposition
- ch02-architectural-coupling — Coupling types and coupling taxonomy
- saht-README — Book overview and chapter index
Last Updated: 2026-05-30