Chapter 4 Flashcards — Architectural Decomposition
flashcards saht decomposition coupling instability abstractness
What is afferent coupling (Ca) and what does a high Ca score indicate?
?
Afferent coupling (Ca) is the number of external components that depend on a given component — it measures “how many things call into me.” A high Ca score means the component is widely used, making it stable but risky to change: any modification propagates to many dependents. High-Ca components should be prioritized as shared libraries or extracted last during decomposition because changing their interface breaks many callers.
What is efferent coupling (Ce) and what does a high Ce score indicate?
?
Efferent coupling (Ce) is the number of external components that a given component depends on — it measures “how many things do I call.” A high Ce score means the component depends on many other things, making it fragile: its behavior can be broken by changes in any of its dependencies. High-Ce components are unstable; they are good candidates for early extraction in decomposition because few things depend on them.
Write the formula for the instability metric (I) and describe what I=0 and I=1 mean.
?
I = Ce / (Ca + Ce)
- I = 0 (maximally stable): Ce = 0, meaning the component depends on nothing external. Many things may depend on it (high Ca), but changes to it are self-contained. Stable components should be abstract to allow extension without modification.
- I = 1 (maximally unstable): Ca = 0, meaning nothing depends on this component. It can depend on many things freely because no one is depending on it. Unstable components can safely be concrete.
Range: 0.0 to 1.0.
Write the formula for the abstractness metric (A) and interpret its extremes.
?
A = (Abstract classes + Interfaces) / (Total classes)
- A = 0: Fully concrete. All behavior is in concrete implementations that cannot be extended without modification.
- A = 1: Fully abstract. All definitions are interfaces/abstract classes; no implementations exist in this component.
Range: 0.0 to 1.0. High abstractness enables extension without modification (Open/Closed Principle).
Write the formula for distance from the main sequence (D) and explain what D=0 vs. D≈1 means.
?
D = |A + I - 1| (normalized: D’ = |A + I - 1| / sqrt(2))
- D = 0: The component sits on the “main sequence” — the ideal relationship where abstract components are stable and concrete components are unstable. This is the target zone.
- D ≈ 1: The component is maximally distant from the ideal, in either the zone of pain or zone of uselessness. These components are structurally problematic and hardest to decompose.
Range: 0.0 to 1.0 (normalized).
Describe the “zone of pain” in the instability/abstractness graph. What kind of components live there, and why is it called “pain”?
?
The zone of pain occupies the region of high stability (I ≈ 0) + low abstractness (A ≈ 0) — i.e., the bottom-left of the graph. These are concrete components that many things depend on. They cause pain because:
- Hard to change: any modification propagates to all dependents (high Ca)
- Hard to extend: no abstraction means you must modify the class itself, not extend it
- Cannot be easily extracted: their high Ca means many other components must be re-wired if they move
Classic examples: tightly coupled utility classes, DAO objects with hundreds of callers.
Describe the “zone of uselessness” in the instability/abstractness graph.
?
The zone of uselessness occupies high abstractness (A ≈ 1) + high instability (I ≈ 1) — the top-right of the graph. These are abstract definitions that nothing actually uses. They are useless because:
- They define interfaces or abstract classes
- But no concrete implementations depend on the structure they provide
- They add conceptual overhead without enabling any real behavior
Common cause: abandoned framework interfaces, speculative generalization left in the codebase.
Draw or describe the instability/abstractness graph with the main sequence and two zones.
?
Abstractness (A)
1.0 |Zone of
|Uselessness *
| *
0.5 | * [Main Sequence]
| *
| *
0.0 |___*__________________
0.0 0.5 1.0
Instability (I)
Zone of Pain
- Main sequence: diagonal from (A=1, I=0) to (A=0, I=1)
- Zone of pain: bottom-left (A≈0, I≈0) — concrete and stable, hard to change or extend
- Zone of uselessness: top-right (A≈1, I≈1) — abstract and unstable, nobody uses it
- D = 0: on the main sequence (ideal)
What D score range suggests a component is a good candidate for clean extraction, vs. a problematic one?
?
| D Range | Interpretation | Decomposition Implication |
|---|---|---|
| 0.0 – 0.2 | On/near main sequence | Clean extraction candidate |
| 0.2 – 0.5 | Some structural distance | Extractable with targeted refactoring |
| 0.5 – 0.7 | Significant issues | Painful extraction; consider tactical forking |
| 0.7 – 1.0 | Zone of pain/uselessness | Very hard incremental extraction |
A codebase where most components score D > 0.5 is a strong indicator that tactical forking may be more tractable than component-based decomposition.
What is the “stability principle” from Robert Martin that SAHT applies to decomposition?
?
Depend in the direction of stability: components that are stable (low I, many things depend on them) should be abstract (high A), because stability is achieved through abstraction — callers depend on an interface, not an implementation. Components that are unstable (high I, few things depend on them) can be concrete — they are free to change. This is the relationship that defines the main sequence: (stable, abstract) and (unstable, concrete) are the two healthy extremes.
What is component-based decomposition and what is its core assumption?
?
Component-based decomposition is an incremental strategy that uses the existing package/namespace/module hierarchy as the basis for identifying service candidates, then systematically moves cohesive groups of classes across service boundaries one service at a time. Its core assumption is that the codebase has some identifiable structure — that packages or modules already loosely reflect domain organization — that can be used as a starting point. This assumption fails for Big Ball of Mud codebases.
What is tactical forking (“clone and prune”) and what problem does it solve?
?
Tactical forking duplicates the entire monolith for each intended service, then progressively deletes code that doesn’t belong in each copy until distinct services emerge. It solves the problem of codebases with no identifiable structure (Big Ball of Mud, D scores mostly above 0.5): when there are no clusters to extract, deletion from a copy is more tractable than extraction from the original. The insight is that in tightly coupled code, deletion is mechanically simpler than extraction.
Why is deletion easier than extraction in a tightly coupled codebase?
?
In a tightly coupled codebase:
- Extraction requires re-wiring all callers of an extracted component to use the new service API. In tightly coupled code, these callers are numerous and interdependent, making re-wiring a complex, error-prone refactoring task.
- Deletion (in tactical forking) only requires removing code that “belongs to another service.” If the deletion breaks anything, the remaining tests catch it — the feedback is clear and local.
This asymmetry — deletion is additive in risk feedback; extraction is subtractive — is the core justification for tactical forking in messy codebases.
What are the three biggest disadvantages of tactical forking?
?
- Massive code duplication: Every fork starts as a full copy. Common utilities, infrastructure, and shared logic exist in N copies. Synchronizing changes across forks is ongoing burden.
- Technical debt multiplication: Any quality problem in the monolith is replicated into every fork. Pruning removes code but not the underlying coupling and quality issues in what remains.
- Risk of distributed monolith: If shared libraries are simply copied (not extracted to a separate service), forks share code but not a service contract — they remain implicitly coupled despite physical separation.
What are the three biggest disadvantages of component-based decomposition?
?
- Requires existing structure: If there are no identifiable component clusters (Big Ball of Mud), there is nothing to incrementally extract. The prerequisite simply does not exist.
- Slower progress: Incremental extraction requires analysis before each step. Teams must maintain and evolve the monolith during the multi-month or multi-year migration.
- Requires deep codebase knowledge: Someone must understand the existing architecture well enough to identify correct boundaries. In aged codebases with high turnover, this knowledge may not exist.
In what order should services be extracted during component-based decomposition, and why?
?
Extract in order of increasing coupling (lowest coupling first):
- Lowest-Ce first: Components with few outgoing dependencies can be extracted with minimal disruption — they don’t need to be re-wired extensively.
- Lowest-D first: Components near the main sequence are structurally clean; extraction is straightforward.
- Lowest-Ca last (or make shared library): Components with many inbound callers should be extracted last because moving them first would require re-wiring all callers simultaneously.
This approach builds team confidence, reduces per-step risk, and leaves the most complex extractions for when the team is most experienced.
What is a dependency structure matrix (DSM) and how is it used in decomposition?
?
A dependency structure matrix (DSM) is a grid where each row and column represents a component, and each cell indicates whether the row-component depends on the column-component. It visualizes coupling structure across the entire codebase. In decomposition:
- Dense blocks on the diagonal reveal clusters of tightly coupled components that naturally belong together in one service
- Sparse off-diagonal entries between clusters reveal the natural service boundaries
- Components that appear frequently in off-diagonal cells are integration points that need explicit API contracts
DSMs are more readable than graphs for large systems and reveal architectural clusters that inform service boundary decisions.
What is the risk of creating a “distributed monolith” during decomposition, and which approach carries higher risk?
?
A distributed monolith results when services are physically separated (separate deployment units) but remain logically coupled — typically through a shared database, shared libraries with implicit state, or synchronous call chains that create runtime dependencies. It has the worst of both worlds: all the operational costs of distribution with none of the independence benefits. Tactical forking carries higher distributed-monolith risk because shared code in forks is often not truly isolated — it creates implicit coupling through shared logic rather than shared contracts.
What does the Sysops Squad saga in Chapter 4 demonstrate about decomposition approach selection?
?
The team chooses component-based decomposition (not tactical forking) because coupling analysis reveals that several packages (ss.notification, ss.reporting, ss.user) have D scores below 0.4 with identifiable structure. Key lessons:
- Coupling metrics determine the choice of approach — not organizational preference
- Extraction order is determined by D and Ca scores (notification extracted first because D ≈ 0.1 and almost independent; ticket extracted last because it has the highest Ca)
- Even choosing component-based, they acknowledge
ss.ticketis near the zone of pain (D ≈ 0.65) and will require targeted refactoring before extraction
Why must application-tier decomposition be accompanied by data decomposition to achieve true scalability?
?
Application services sharing a database are not truly independent — the database becomes the monolith. Even if application services scale independently, they all compete for connection pool slots, table locks, and query throughput on the shared database. The scalability ceiling is then the database, not the application tier. True scalability requires that each service owns its data (separate schema or separate database), which is covered in Chapter 6. Application decomposition without data decomposition produces architectural improvement on only one dimension.
How do Ca and Ce scores combine to identify “hub” components, and why are they the hardest to decompose?
?
Hub components have both high Ca (many things depend on them) and high Ce (they depend on many things). They are the hardest to decompose because:
- High Ca means extraction requires re-wiring many callers to use the new service API
- High Ce means the extracted service still needs access to many other components, pulling other components into scope
- Hubs often have D scores in the zone of pain, making them structurally problematic
- In tactical forking, hubs appear in every fork’s remaining code, becoming a cross-cutting duplication problem
Hub components are commonly the last extracted in component-based decomposition or become shared platform services.
What question does Chapter 4 say must be answered BEFORE choosing a decomposition approach?
?
“Is the codebase decomposable?” — specifically, does the codebase have sufficient internal structure (low D scores, identifiable component clusters) to support incremental extraction? This question is answered by running coupling analysis (Ca, Ce, A, I, D) across major packages/modules. The answer determines whether component-based decomposition (structured codebase) or tactical forking (unstructured codebase) is appropriate. Choosing the wrong approach for the codebase’s actual structure is a major risk factor in decomposition projects.
What tool categories exist for measuring Ca, Ce, abstractness, instability, and D in real codebases?
?
- JDepend (Java): Computes Ca, Ce, A, I, D per package; standard tool for Java decomposition analysis
- NDepend (.NET): Equivalent metrics for .NET assemblies with visual dependency graphs
- ArchUnit (Java/Kotlin): Enforces coupling rules as automated tests; can be used to measure and constrain coupling
- Lattix: Dependency structure matrix visualization across multiple language ecosystems
- Custom scripts: Import-statement parsing in Python, Go, JavaScript/TypeScript to compute component-level dependencies
These tools are used as a prerequisite analysis step before committing to a decomposition strategy.
Compare component-based decomposition vs. tactical forking across five key dimensions.
?
| Dimension | Component-Based | Tactical Forking |
|---|---|---|
| Codebase requirement | Needs structure (D < 0.5) | Works on any codebase |
| Initial speed | Slower (analysis first) | Faster (start immediately) |
| Code duplication | None (code moves) | High (N full copies) |
| Technical debt | Cleaned up during extraction | Replicated into every fork |
| Resulting quality | Generally higher | Requires post-fork investment |
The choice is driven by codebase structure, not preference.
Total Cards: 25
Priority: HIGH
Last Updated: 2026-05-30