Chapter 12 Flashcards — Unit Testing

flashcards seg testing unit-tests


Why do the authors say brittle tests may be “worse than no tests at all”?
?
Because brittle tests break on every refactoring — even behavior-preserving ones. This trains engineers to treat test failures as noise rather than signal. Once engineers learn to ignore failures, the safety net is gone. When a real regression eventually occurs, it is ignored along with the false positives. A test suite that engineers do not trust actively damages development by consuming time without providing protection.

What is a brittle test?
?
A brittle test is one that fails when the code changes in a way that does not affect the behavior the test is supposed to validate. Brittle tests require updates during pure refactoring — when implementation changes but behavior is preserved. The defining characteristic: a brittle test breaks due to an internal change that no real user or caller of the code would notice.

What are the four types of production code changes, and which should require test updates?
?

  1. Pure refactoring (implementation changes, behavior preserved) — tests should never need updating. 2. New features (new behavior added) — existing tests should not break; new tests should be added. 3. Bug fixes (incorrect behavior corrected) — tests validating the bug may need updating; new regression tests should be added. 4. Behavior changes (deliberate modification of existing behavior) — existing tests should be updated to reflect new intent. If a pure refactoring causes test failures, those tests are brittle.

What does “test via public APIs” mean, and why does it prevent brittle tests?
?
Test via public APIs means invoking the code under test through the same interface that real callers use — not through private methods, package-private internals, or reflection. It prevents brittleness because public APIs are deliberately stable: refactoring internal implementation does not change the public API. A test written to a private method validates a detail that may legitimately change during any refactoring; a test written to the public API validates the contract that real callers depend on.

What is the rule of thumb for knowing whether a test is testing implementation details?
?
If accessing the code under test requires calling a private or package-private method, or reaching into internal state via reflection, the test is testing implementation rather than behavior. The test should be rewritten to use only the public API. A corollary: if a pure refactoring that preserves behavior causes a test to fail, the test was testing implementation details.

What is the difference between state testing and interaction testing?
?
State testing asks: given these inputs, what did the system produce? It asserts on return values, object state, or data structure contents after an operation. Interaction testing asks: did the system call these methods with these arguments? It asserts on the how — typically using mock verify() calls. State testing is more robust because it survives implementation changes that produce the same output differently. Interaction testing is more brittle because it couples the test to a specific execution path.

When is interaction testing (verifying method calls) appropriate?
?
Interaction testing is appropriate when validating side effects that cannot be observed through state — for example, verifying that an email was sent, that a log entry was written, or that an audit record was created in an external system. In these cases the side effect is the behavior, and there is no state to assert on. Outside these cases, prefer state testing — asserting on what the system produced rather than how it produced it.

What does it mean for a test to be “complete” and “concise”?
?
Complete: the test body contains all information needed to understand what is being tested, without the reader needing to look elsewhere (setUp methods, shared fixtures, other test classes). Concise: the test contains no irrelevant information that obscures the intent. When these goals conflict, include all information relevant to the behavior under test and exclude everything else — err toward completeness, because missing context is more harmful than minor verbosity.

What is the difference between testing behaviors vs. testing methods?
?
Method-oriented testing: one test per production method, named testProcessPayment(), testing all branches of that method together. Behavior-oriented testing: each test covers one specific scenario or outcome, named as a sentence like shouldRefundWhenPaymentExceedsBalance(). Behavior-oriented tests make failure messages self-explanatory (the test name tells you what broke), make it easy to see what scenarios are covered, and allow a method with complex branching to be decomposed into focused, readable tests.

Why should you not put logic (if/for/while) in test code?
?
When a test contains branching or looping logic, it introduces the possibility of bugs in the test itself — bugs that may cause the test to pass when the code is wrong. For example, a for loop asserting on multiple values may mask a bug if the assertion is incorrectly structured. Tests should contain no branching logic and should use hardcoded literal values rather than computed expected values. If multiple cases need testing, write multiple separate, logic-free test methods.

What makes a test failure message useful?
?
A useful failure message communicates three things: (1) what behavior was being tested, (2) what was expected, and (3) what actually happened. Poor: AssertionError: expected true but was false. Good: Expected account balance to be 50 after refund of overcharge, but was 150. Account: Account{id=123, balance=150}. Modern assertion libraries (AssertJ, Truth) generate better messages than raw assertEquals; when they are insufficient, write custom messages that include relevant state.

What is DAMP, and how does it differ from DRY in the context of tests?
?
DAMP (Descriptive And Meaningful Phrases): a test-code principle that prioritizes local readability and self-containment over eliminating duplication. Each test tells its own complete story within its body. DRY (Don’t Repeat Yourself): the production-code principle of extracting duplication into shared utilities. In test code, DRY applied aggressively creates indirection — tests that require tracing through setUp(), helpers, and fixture classes to understand — making tests harder to read, easier to misunderstand, and more fragile when shared code changes.

Why does DRY, applied aggressively to test code, make tests worse?
?
Because test readability depends on local clarity — a reader should understand what a test validates by reading its body alone. Aggressive DRY extracts details into setUp(), shared helper methods, and fixture files. The result: a reader must trace multiple files and methods to understand a single test. This also creates fragile coupling: changing a shared helper method to satisfy one test may silently break other tests that depend on it in subtle ways.

When is shared setUp() (beforeEach) appropriate vs. harmful in tests?
?
Appropriate: setting up infrastructure that is identical for every test — e.g., instantiating the class under test, connecting to a test database, creating a standard collaborator. This eliminates pure boilerplate without obscuring test intent. Harmful: establishing state that only some tests depend on, or establishing values that tests rely on implicitly. When setUp() creates state that only some tests use, tests become coupled to irrelevant setup, and readers must read setUp() to understand what is actually relevant to a given test.

What is the guideline for shared values in tests?
?
When test values must be shared across tests, names must carry meaning. USER_WITH_ZERO_BALANCE tells the reader why that user matters for the test. USER_A tells the reader nothing — they must find the definition and infer its relevance. When a value is specific to one test’s scenario, define it inside that test body so the value and its relevance are co-located. Shared values are appropriate only when they represent truly generic fixtures used identically across many tests.

What are “action helpers” and “validation helpers” in test code, and when are they appropriate?
?
Action helpers: methods that perform a multi-step setup action and return a result for the test to assert on (e.g., makePayment(user, amount)). They are DAMP-compatible because the test body still contains the important values and asserts on the result — only the boilerplate is extracted. Validation helpers: methods that perform a multi-assertion pattern (e.g., assertPaymentRefunded(payment, amount)). Appropriate when the assertion pattern is genuinely reused and the helper name clearly describes what is being validated.

How should test infrastructure differ from test-level helpers?
?
Test infrastructure — fakes, builders, custom matchers, test harnesses — is shared across the codebase and should be treated as first-class production-quality code: designed carefully, documented, code-reviewed, and maintained. Test-level helpers are local to one test class and can be more informal. The distinction matters because test infrastructure is a dependency for many tests; bugs or design flaws in it affect the entire test suite.

What is the anti-pattern of “testing implementation details” and what does it look like?
?
Testing implementation details means writing tests that assert on internal state or private behavior rather than observable outputs. It looks like: calling private methods via reflection, asserting on intermediate computed values rather than final outputs, verifying specific method call sequences when the same outcome could be achieved differently. The result: any refactoring that changes implementation without changing behavior breaks these tests, creating a false signal that something is wrong.

What does it mean for a test to “never fail,” and why is it dangerous?
?
A test that is written in a way it cannot possibly fail provides zero value while appearing as valid coverage. Common patterns: asserting on a value that is always true regardless of the code under test (assertTrue(true)), catching all exceptions and swallowing them, or having a conditional that silently skips the assertion. The danger: the test inflates coverage metrics and passes code review, creating false confidence — the safety net appears intact but has holes.

Why should you always see a test fail at least once?
?
A test that has never been observed to fail may be incorrectly implemented — it may pass whether or not the code under test is correct. The authors recommend verifying test validity by: (1) writing the test before the implementation exists (TDD style — the test should fail until the code is written), or (2) temporarily introducing a deliberate bug in the production code to verify the test catches it. A test that cannot be made to fail is not testing anything.

What is the core problem with over-mocked tests?
?
When too many real dependencies are replaced with mocks, the test validates the mock configuration rather than the system’s behavior. You configure a mock to return X, call the code, verify the mock was called — but this only verifies that you wrote the right verify() call, not that the system is correct. As mock usage increases, the test suite grows and coverage climbs, but the system’s real behavior is increasingly untested. Over-mocked tests pass even when the system is broken.

How does behavior-oriented test naming improve failure diagnostics?
?
When a test is named shouldRefundWhenPaymentExceedsBalance(), a failure in CI immediately tells you: the refund behavior is broken when payment exceeds balance. The developer does not need to read the test body before understanding the nature of the failure. With method-oriented naming (testProcessPayment()), the failure message tells you only that something in processPayment is wrong — the developer must read the test body to understand which scenario failed. Behavior-oriented names make test failures self-diagnosing.

What is the relationship between DAMP tests and onboarding new engineers?
?
DAMP tests dramatically lower the cost of onboarding. A new engineer reading a DAMP test can understand what is being tested, what the expected behavior is, and what the interesting inputs are from the test body alone, without needing to learn the structure of shared fixtures, setUp() methods, and helper utilities. DRY test suites with heavy abstraction require new engineers to build a mental model of the entire test support infrastructure before they can read individual tests confidently.


Total Cards: 23
Review Time: ~20 minutes
Priority: HIGH
Last Updated: 2026-06-02