Chapter 6 Flashcards — Measuring and Governing Architectural Characteristics
flashcards fsa fitness-functions governance metrics
What is an architectural fitness function?
?
An architectural fitness function is any objective, automated mechanism that provides a measurable signal about whether one or more architectural characteristics are being maintained. It produces a pass/fail or numeric result — not an opinion — and runs as part of the CI/CD pipeline or on a schedule.
Why does manual architectural governance fail at scale?
?
Manual governance fails for five structural reasons: (1) human review is inconsistent — different reviewers apply different standards; (2) PR volume in large teams makes exhaustive review impossible; (3) latency — violations are found after merge, when fixes are expensive; (4) architectural rationale is rarely documented, so reviewers don’t know what they’re protecting; (5) without metrics, “this violates our architecture” is an opinion, not a verifiable claim.
What is the difference between a fitness function and a unit test?
?
Unit tests verify functional correctness — does this function return the right value for these inputs? Fitness functions verify architectural correctness — does this system still have the structural properties we designed for (e.g., no forbidden dependencies, response time within budget, complexity within limits)? Some fitness functions use test frameworks like JUnit, but their purpose is architectural, not functional.
What is an atomic fitness function? Give an example.
?
An atomic fitness function executes against a single, isolated component or module and runs quickly. It tests a localized architectural property rather than an emergent system-level property. Example: an ArchUnit test verifying that no class in the domain package imports from the infrastructure package. Example: a complexity check verifying a single module’s cyclomatic complexity does not exceed 10.
What is a holistic fitness function? Give an example.
?
A holistic fitness function executes against the entire integrated system to verify characteristics that only emerge at the system level. It typically runs slower and requires a deployed environment. Example: a load test verifying p99 response time under 500 concurrent users. Example: a chaos engineering experiment that kills a service instance and verifies the system continues serving requests within its availability SLA.
What is the difference between triggered and temporal fitness functions?
?
Triggered fitness functions execute in response to an event — a commit, pull request merge, or deployment. They are the most common type and enforce real-time compliance. Temporal fitness functions execute on a schedule (nightly, weekly, monthly), independent of code changes. They are used for characteristics that can degrade over time due to external factors — for example, a weekly chaos engineering experiment or a monthly backup restoration test.
What is the difference between static and dynamic fitness functions?
?
Static fitness functions analyze code or architecture structure without executing it — dependency analysis, complexity metrics, coupling measurements. They are fast, deterministic, and suitable for every commit. Dynamic fitness functions require the system to run — performance benchmarks, availability monitors, chaos experiments. They are slower and environment-dependent but verify real runtime behavior.
What is cyclomatic complexity and what thresholds should trigger concern?
?
Cyclomatic complexity (CC) measures the number of independent execution paths through a unit of code. Computed as: count branching constructs (if, while, for, case, &&, ||) + 1. Thresholds: CC 1-5 is simple; CC 6-10 is manageable; CC 11-20 is high complexity and a refactoring candidate; CC > 20 is very high and almost always a maintainability and testability problem. Architecturally, high CC degrades agility and reliability characteristics.
What is afferent coupling (Ca) and why does it matter architecturally?
?
Afferent coupling (Ca) is the number of external components that depend on a given component — its “fan-in.” High Ca means the component has high impact if changed: many other components may break. Architecturally, components with high Ca should be designed for maximum stability (stable interfaces, backward compatibility), because changes ripple outward to many dependents.
What is efferent coupling (Ce) and why does it matter architecturally?
?
Efferent coupling (Ce) is the number of external components that a given component depends on — its “fan-out.” High Ce means the component is fragile: it can be broken by changes in any of its many dependencies. Architecturally, components with high Ce should be candidates for facade patterns or dependency inversion to reduce fragility.
What is the Instability metric (I) and how is it calculated?
?
Instability (I) = Ce / (Ca + Ce). Range: 0 (maximally stable — nothing this depends on can change it) to 1 (maximally unstable — nothing depends on it, but it depends on many things). The Stable Dependency Principle states: depend in the direction of stability. Components with high I should not be depended upon by stable, low-I components.
What is “Distance from the Main Sequence” (D) and what does it measure?
?
Distance from the Main Sequence (D) = |Abstractness + Instability - 1|. It measures how far a component deviates from the ideal trade-off between abstractness and stability. D = 0 is ideal. Two problematic extremes: the Zone of Pain (low abstractness + low instability = very stable but entirely concrete — rigid and hard to change); the Zone of Uselessness (high abstractness + high instability = abstract but nothing depends on it).
What does ArchUnit do and how is it used as a fitness function?
?
ArchUnit is a Java library that allows architectural dependency rules to be expressed as executable JUnit tests. Rules like “no class in the domain layer may import from the infrastructure layer” are written as code and run in CI on every commit. A violation fails the build immediately. This automates dependency governance that would otherwise require manual code review, making it a static, triggered, atomic fitness function.
Why are percentile response times (p99, p999) better fitness function targets than average response time?
?
Averages are misleading because they mask the tail of the distribution. A system with average response time of 100ms might have p99 of 2,000ms — 1 in 100 users waits 20x longer than the average suggests. The long tail problem means real user experience is determined by percentiles. p99 measures the threshold below which 99% of requests fall; p999 is used for high-reliability systems. Fitness functions for performance should always target percentiles at specific load levels.
What is the formula for availability expressed as “nines” and what does 99.9% mean in downtime per year?
?
Availability nines: 99% = ~87.6 hours downtime/year; 99.9% = ~8.76 hours downtime/year (three nines); 99.99% = ~52.6 minutes; 99.999% = ~5.26 minutes (five nines). The fitness function form: “Monthly availability must not fall below 99.9%.” Note that achieving higher nines is exponentially more expensive and architecturally demanding — each additional nine typically requires a fundamentally different architectural approach.
What is MTTR and why is it both an operational and a process measure?
?
MTTR (Mean Time to Recovery) is the average elapsed time to restore service after a failure event. It is an operational measure because it reflects the runtime resilience of the deployed system. It is also a process measure because it is strongly influenced by observability quality, on-call procedures, runbook completeness, deployment rollback speed, and feature flag availability. Architecting for low MTTR requires both technical design (fast rollback, circuit breakers) and process design (incident runbooks, on-call rotations).
What is Change Failure Rate (CFR) and what are the DORA benchmark thresholds?
?
Change Failure Rate (CFR) is the percentage of production deployments that result in a degradation requiring remediation (rollback, hotfix, or patch). It is a DORA metric measuring deployment reliability. DORA benchmarks: Elite performers < 5%; High: 5-10%; Medium: 10-15%; Low: > 15%. High CFR indicates insufficient testing, poor deployment practices, architectural brittleness, or inadequate staging environments. A fitness function might: “Monthly CFR must not exceed 10%.”
What are the four DORA metrics and what architectural characteristics do they govern?
?
The four DORA (DevOps Research and Assessment) metrics are: (1) Deployment Frequency — governs deployability (agility sub-characteristic); (2) Lead Time for Changes — governs agility (commit to production pipeline efficiency); (3) Change Failure Rate — governs reliability and deployability; (4) MTTR — governs recoverability and availability. Elite performers score well on all four simultaneously, demonstrating that speed and stability are not in conflict when architecture supports it.
What is chaos engineering and why is it classified as a holistic fitness function?
?
Chaos engineering deliberately introduces failures into running systems (killing service instances, injecting latency, saturating resources) to verify that the system maintains its reliability and availability characteristics under failure conditions. It is holistic because it tests emergent behavior — whether the system as a whole responds correctly to failure — which cannot be verified by testing individual components. Tools include Netflix Chaos Monkey, Gremlin, and AWS Fault Injection Simulator.
What is SAST and how is it used as an architectural fitness function for security?
?
SAST (Static Application Security Testing) scans source code for known vulnerability patterns without executing the code. Tools include SonarQube, Checkmarx, and Semgrep. As a fitness function, SAST is: static (code analysis), triggered (runs on every commit or PR), and atomic (can scan individual modules). A SAST failure — a high-severity vulnerability pattern detected — fails the build, making security governance automated and continuous rather than dependent on security review.
What is “Governance Theater” and how do you fix it?
?
Governance Theater is when fitness functions exist in CI but are never allowed to fail because thresholds are set so low that nothing ever violates them. It creates a false sense of architectural governance with no actual protection. The fix: set thresholds based on the actual characteristic requirement (e.g., “p99 < 300ms per the SLA”), not on what the current codebase happens to achieve. If the current code achieves p99 = 800ms, the threshold should target 300ms and the code should be improved to meet it.
What is “Coverage Washing” and how do you detect and fix it?
?
Coverage Washing is achieving high test coverage metrics with tests that have no meaningful assertions — code paths are executed but nothing is verified. The coverage metric looks healthy while the protection is absent. Detection: mutation testing tools (Pitest for Java, Stryker for JavaScript) introduce deliberate bugs (mutations) into the code and verify that the test suite catches them. A mutation that survives (is not caught by any test) reveals a coverage-washed path. Fix: mandate mutation score as an additional fitness function alongside line coverage.
What is the “Holistic Overload” antipattern in fitness function design?
?
Holistic Overload is running expensive holistic fitness functions — full chaos experiments, performance load tests, end-to-end integration suites — on every commit. This slows the CI pipeline to the point where developers wait 30-60 minutes for feedback and eventually bypass the pipeline. The fix is fitness function tiering: fast static and unit-level fitness functions run on every commit (seconds to minutes); expensive holistic fitness functions run nightly, weekly, or as release gates (minutes to hours).
What should a fitness function registry contain?
?
A fitness function registry maps each architectural characteristic to its fitness functions and records: the characteristic being governed, the specific fitness function (tool + rule), the threshold (pass/fail criterion), the owner (team or individual responsible), the execution trigger (commit, nightly, weekly), and the last-known status (passing/failing). This provides visibility into which characteristics are actually governed, accountability for maintenance, and auditability of threshold changes over time.
How should fitness functions evolve over time?
?
Fitness functions must be treated as first-class production code and maintained accordingly: (1) when a characteristic is retired, remove its fitness functions; (2) when a threshold becomes trivially easy to meet, tighten it; (3) when a fitness function produces false positives, tune it rather than disable it; (4) when a new architectural risk emerges, write a new fitness function proactively; (5) schedule regular reviews of the full fitness function suite, just as you would refactor production code. Stale, flaky, or irrelevant fitness functions erode trust in the governance system.
What is test coverage as an architectural fitness function and what are its limitations?
?
Test coverage (percentage of lines, branches, or functions exercised by tests) is a fitness function for the maintainability and agility characteristics — it measures how safely the system can be changed. A typical fitness function: “Overall line coverage must remain above 80%; coverage for the domain package above 90%; coverage may not decrease on any PR by more than 2 percentage points.” Limitation: coverage is necessary but not sufficient — it measures execution, not assertion quality. Must be combined with mutation testing for full confidence.
Total Cards: 26
Estimated Review Time: 35-45 minutes (first pass), 15-20 minutes (subsequent)
Priority: HIGH — fitness functions are the primary governance mechanism; understanding this chapter is required to operationalize any architectural characteristic
Last Updated: 2026-05-29