Chapter 13 Flashcards — Test Doubles

flashcards seg testing test-doubles

What is a test double?
?
Any object, function, or system that replaces a real production dependency in a test. The term — borrowed from the film industry term “stunt double” — encompasses fakes, stubs, mocks, spies, and dummies. Test doubles allow the code under test to be isolated from slow, nondeterministic, or side-effectful external systems.

What is a seam in the context of testability?
?
A place in code where behavior can be changed without editing the production code itself — typically by substituting a dependency. Seams are the insertion points for test doubles. Code that uses dependency injection (passing dependencies as constructor arguments or method parameters) has natural seams; code that creates dependencies internally with new or calls global singletons does not.

What are the three techniques for using test doubles?
?

Faking: creating a lightweight, working implementation of the real dependency that has real behavioral logic but is designed for testing (e.g., an in-memory database).
Stubbing: configuring a function to return a pre-programmed value when called, with no real behavioral logic.
Interaction testing (mocking): verifying that a function was called with specific arguments or a specific number of times.

What is a fake, and how does it differ from a stub?
?
A fake is a simplified but behaviorally correct implementation of a production dependency — it has real logic (e.g., a fake database stores data in a map and actually retrieves it). A stub simply returns a hardcoded pre-programmed value when called, with no logic. Fakes have high fidelity; stubs have essentially none. Fakes remain valid through implementation changes; stubs encode specific expected call sequences.

What is stubbing?
?
The practice of configuring a test double to return a pre-programmed value when a specific method is called. Example: when(mockUserService.getUser(123)).thenReturn(fakeUser). Stubbing is appropriate when a test needs the code under test to receive a specific value to exercise a particular code path, but must be used sparingly to avoid brittle tests.

What is interaction testing (mocking)?
?
Verifying that a specific function was called with specific arguments or a specific number of times, rather than verifying the observable output or state. Example: verify(emailService).sendEmail("alice@example.com", "Welcome!"). Interaction testing checks how a result was achieved rather than what the result is.

When should you prefer the real implementation over a test double?
?
When the real implementation is fast (runs in milliseconds), deterministic (same inputs always produce same outputs), and has no unacceptable side effects in a test environment. Test doubles introduce a gap between what is tested and what runs in production — every double is a hypothesis that may be wrong. The real implementation should always be the first choice.

What is fidelity in the context of test doubles?
?
Fidelity is the degree to which a test double behaves like the real production implementation. A high-fidelity fake closely mirrors the real system’s behavior for the scenarios being tested. A low-fidelity stub returns hardcoded values with no behavioral logic. The appropriate fidelity depends on the test’s goals — a fake needs to match the real system’s semantics for the inputs and operations that appear in tests, but does not need to replicate every edge case.

Why are fakes considered the most valuable type of test double?
?
Because fakes have real behavioral logic — they behave like the production dependency for the scenarios tests exercise — while avoiding the costs of the real implementation (speed, nondeterminism, side effects). Fakes provide higher confidence than stubs because they actually exercise the code’s behavior rather than just its call sequence. They also remain valid through implementation changes, unlike interaction tests.

Who should write and maintain a fake for an API?
?
The owner of the real API should provide and maintain a canonical fake alongside it. Co-location ensures the fake stays in sync as the API evolves, and that it is maintained by engineers who understand the real system’s behavior. If there is no canonical fake, each team writes its own, which leads to divergence, inconsistency, and loss of confidence in test results.

What is the “everyone writes their own fake” anti-pattern?
?
When no canonical fake exists for an API, each team creates their own. These DIY fakes diverge from the real API over time and from each other, meaning tests pass but with different assumptions across teams. The anti-pattern is prevented by having the API owner provide and maintain a single canonical fake that all consumers use.

Why should fakes themselves be tested?
?
A fake that behaves differently from the real implementation provides false confidence — tests pass but production behavior differs. Fakes should have contract tests (also called conformance tests) that run the same test suite against both the real implementation and the fake. Any divergence is caught immediately. A wrong fake is worse than no fake.

What are contract tests (conformance tests)?
?
Tests that verify a fake conforms to the real implementation’s behavior. For each behavior B of the real implementation, a contract test verifies that both the real implementation and the fake satisfy B identically. This ensures the fake remains accurate as the real implementation evolves. Contract tests are the primary defense against fakes that provide false confidence.

What is the primary danger of overusing stubbing?
?
Over-stubbing leads to testing the mock rather than the behavior — tests verify that the code called the right methods with the right arguments, but do not verify that those calls produce correct outcomes. Additionally: tests become brittle (break when implementation changes even if behavior doesn’t), unclear (too many stubs obscure what’s being tested), and give false confidence (stubs don’t verify the real dependency would behave as expected).

What does “testing the mock, not the behavior” mean?
?
An anti-pattern where every dependency is stubbed so that the test only verifies that the code called the right methods in the right order — not that it produces correct observable behavior. If the code is refactored to achieve the same outcome through different method calls, the test breaks even though behavior is unchanged. The test is verifying implementation details rather than the behavior users care about.

What is state testing, and why is it preferred over interaction testing?
?
State testing verifies the observable output or state of the system after exercising it — what the system produced, what state it is in, what it returned. It is preferred over interaction testing because it is robust to implementation changes (how the result is achieved can change without breaking the test), it tests what users care about (the outcome), and it is more readable. Interaction testing is coupled to implementation details and breaks on refactoring.

Give an example of when interaction testing IS appropriate.
?

Side-effectful operations with no testable return value: e.g., verifying that an audit log entry was written — there may be no other observable output to test.
The interaction is the explicit contract: e.g., “every payment must be logged to the audit system before processing” — verifying the call is itself the requirement.
Call-count constraints: e.g., a database query must not be made more than once per request due to caching — verifying the call count enforces the performance contract.

What is over-specification in interaction testing?
?
Verifying more details of a method call than are relevant to the test — e.g., checking every argument (including irrelevant ones like logging metadata, priority flags, or empty lists) when only one argument (like the recipient email address) is the point of the test. Over-specification creates fragility: the test breaks when irrelevant details change, even though the behavior is unchanged. Verify only what matters.

What is the difference between state testing and interaction testing applied to the same scenario?
?
Example: testing that sendInvitation(user) works correctly.

State test: assert that user.invitationStatus == INVITED after the call — tests the outcome.
Interaction test: verify(emailService).sendEmail(user.email, "You're invited!") — tests the mechanism.
The state test remains valid if invitations are later sent via in-app notification instead of email (if status is still updated). The interaction test breaks on that refactoring even if users still receive invitations.

Why is verify(x, never()).method() considered a fragile assertion?
?
Asserting that something was never called tests a negative that is usually not meaningful in isolation. If the intent is “free orders should not be charged,” a state test (verify the order result shows no charge) is more direct and less fragile than verifying the payment service was never called. The interaction-based negative assertion also breaks if the code path changes in a way that doesn’t affect the absence of charging.

What makes a mocking framework dangerous if used without discipline?
?
The ease of the API makes it trivially easy to stub every dependency and verify every call without thinking about what is actually being verified. The result is tests that pass but provide little confidence in real behavior, are brittle to refactoring, and encode implementation details rather than behavioral contracts. The framework’s power is its danger — discipline in choosing when to use stubs vs. fakes vs. real implementations is required.

What is a hermetic test environment?
?
A test environment where real implementations of services are used, but in isolated test instances that are pre-populated with known data and do not share state across tests. Hermetic testing is preferred over mocking at Google when feasible because it tests real integration while maintaining test isolation. It avoids the fidelity gap that test doubles introduce.

What are the four situations where a test double is appropriate instead of the real implementation?
?

The real implementation is too slow (e.g., a network call or full database)
The real implementation is nondeterministic (e.g., a clock, random number generator, or external service with variable latency)
The real implementation has unacceptable side effects in tests (e.g., sending real emails, charging real credit cards, writing to production databases)
The real implementation requires complex external infrastructure that is impractical to provision in a test environment

What is the relationship between dependency injection and testability?
?
Dependency injection (passing dependencies as constructor arguments or method parameters rather than creating them internally) creates seams — places where test doubles can be substituted without modifying production code. Code without dependency injection (using new inside methods or calling global singletons) cannot accept test doubles without either modifying production code or using bytecode-level mocking. Dependency injection is the primary design practice that enables test doubles.

What is the anti-pattern of “excessive mocking”?
?
Using test doubles (especially stubs and mocks) for every dependency regardless of whether the real implementation is suitable. Excessive mocking prevents the code under test from integrating with its real dependencies, which can mask real bugs that would surface in integration. It also creates test suites that are brittle, hard to understand, and provide less confidence than tests using real or high-fidelity fake implementations.

When should a test use a stub vs. a fake?
?
Use a stub when: the dependency interaction is simple (one or two return values), there is no existing fake, and the test is clear with a few hardcoded values. Use a fake when: the dependency has complex behavior that multiple test cases need, tests require realistic behavioral logic (not just fixed returns), or the dependency is used heavily across many tests. As a rule, if a test requires more than two or three stubs to set up, a fake is likely the better tool.

What distinguishes interaction testing from state testing in terms of coupling to implementation?
?
Interaction testing is coupled to the implementation — it verifies how the code achieves a result (which methods it calls, in what order, with what arguments). If the code is refactored to achieve the same result differently, the interaction test breaks. State testing is decoupled from the implementation — it verifies what the code produces, independent of how it produces it. Refactoring that preserves behavior does not break state tests.

Total Cards: 27
Review Time: ~20 minutes
Priority: HIGH
Last Updated: 2026-06-02

Study Notes by Niladri & AI

Explorer

ch13-flashcards

Chapter 13 Flashcards — Test Doubles

Graph View