Chapter 9 Flashcards — Unit Tests

What are the Three Laws of TDD?
?

You may not write production code until you have written a failing unit test. 2. You may not write more of a unit test than is sufficient to fail (compilation failures count). 3. You may not write more production code than is sufficient to pass the currently failing test.

Why are dirty tests considered worse than no tests at all?
?
Dirty tests impose a maintenance burden (they must be updated every time production code changes) while providing false confidence that behavior is verified. Teams eventually stop maintaining dirty tests, leaving a stale, misleading test suite. No tests impose no maintenance burden. Dirty tests impose cost without full benefit.

What does “tests enable change” mean in practice?
?
A comprehensive, passing test suite gives developers the courage to refactor production code. Without tests, every change is a guess — developers stop refactoring, code accumulates complexity, and the codebase rots. With tests, a green suite after a change proves the behavior is preserved. Tests are the safety net that keeps code clean over time.

What is the BUILD-OPERATE-CHECK pattern?
?
A three-phase structure for clean tests: BUILD the test data and preconditions, OPERATE on the object under test, CHECK that the results are correct. Also known as Arrange-Act-Assert (AAA) and Given-When-Then (BDD style). It gives every test a predictable, readable structure.

What are the three equivalent names for the three-phase test structure?
?

BUILD-OPERATE-CHECK (Robert Martin’s term in Clean Code). 2. Arrange-Act-Assert (AAA) — most common in Java and C++ communities. 3. Given-When-Then — BDD style, common with Cucumber and Gherkin. All describe the same three phases: setup, action, verification.

What is the “dual standard” for test code?
?
Test code must be readable above all else, but it does NOT need to be as efficient in memory or CPU as production code. In tests, string concatenation, large helper methods, and verbose builders are acceptable — even desirable — if they make the test easier to read. The standard is: production code optimizes for runtime; test code optimizes for readability.

What is the “one assert per test” rule?
?
Each test function should contain exactly one assertion, so that when the test fails, you know immediately which single behavior regressed. This is generalized to “one concept per test”: even if multiple assertions are needed to verify one concept, each distinct concept should live in its own test function.

Why split a test with multiple assertions into separate test functions?
?
When a test with multiple assertions fails, you must debug to find which assertion failed and why. When each test has one concept, the failing test name is self-documenting — it tells you exactly which behavior regressed. Diagnosis is immediate, not investigative.

What is the F.I.R.S.T. acronym for clean tests?
?
F — Fast: tests run in milliseconds. I — Independent: tests don’t depend on each other. R — Repeatable: same result in any environment. S — Self-Validating: boolean pass/fail outcome. T — Timely: written just before the production code. All five must be satisfied for a test suite to be trustworthy.

What does “Fast” mean in F.I.R.S.T., and what happens when tests are slow?
?
Fast means unit tests run in milliseconds, not seconds. When tests are slow, developers stop running them frequently. Without frequent test runs, the feedback loop breaks — regressions are discovered late, making them expensive to fix. The practical rule: no real I/O (HTTP, database, filesystem) in unit tests; use fakes and mocks instead.

What does “Independent” mean in F.I.R.S.T., and what happens when tests depend on each other?
?
Independent means each test sets up its own state and can run in any order without affecting other tests. When tests share mutable state, a single failure can cascade into many spurious failures, making it impossible to diagnose the real problem. Use @BeforeEach (Java) or pytest fixtures to give each test a fresh, isolated environment.

What does “Repeatable” mean in F.I.R.S.T., and what are common violations?
?
Repeatable means the test produces the same result every time — on developer machines, in CI, offline, in different time zones. Common violations: using new Date() / LocalDate.now() without an injected clock; calling real external services; relying on file system paths. Fix: inject Clock, seed random number generators, use in-memory fakes.

What does “Self-Validating” mean in F.I.R.S.T.?
?
Self-Validating means the test has a boolean outcome — it either passes or fails automatically. A test that requires a human to read a log file and judge whether the output “looks correct” is not a test — it is a manual inspection step. Every test must contain at least one explicit assertion that fails loudly on regression.

What does “Timely” mean in F.I.R.S.T., and why is it important for design?
?
Timely means tests are written just before the production code they verify. If you write tests after, you may discover the production code was designed in a way that makes it hard to test (direct construction of dependencies, global state, no interfaces). Writing tests first forces testable design — code naturally gets dependency injection and clean interfaces.

What is a test DSL and when should you create one?
?
A test DSL (Domain-Specific Language) is a set of helper functions that reads like a business-level specification rather than raw API calls. Example: anOrder().forCustomer(ID).withItem("SKU", qty(2), price("9.99")).build() instead of 10 lines of setters. Create a test DSL when test setup exceeds 5-7 lines or when the same setup pattern repeats across multiple tests. It emerges organically — don’t design it upfront.

Why does test code need to change when production code changes?
?
Tests verify behaviors of the production code. When production code evolves — new behaviors added, old behaviors changed, interfaces refactored — the tests must be updated to reflect the new contract. If tests don’t change with production code, they either pass incorrectly (false confidence) or fail incorrectly (false alarms). Test maintenance is the price of having a living test suite.

What is the “Template Method” pattern used for in tests?
?
When multiple tests share complex setup (BUILD) code but have different operations or assertions, the Template Method pattern factors the shared setup into a @BeforeEach method or a factory helper. This avoids duplicating setup code while keeping each test focused on one concept. Example: a cartWithTwoWidgets() factory method reused across multiple checkout tests.

What’s wrong with a test that always passes regardless of production code behavior?
?
A test that always passes has no discriminating power — it cannot detect regressions. This is sometimes called a false green. It is worse than a missing test because it creates the false impression that a behavior is verified. Every test must fail when the behavior it describes is broken, and pass only when the behavior is correct.

How does the AAA pattern (Arrange-Act-Assert) differ from the Given-When-Then pattern?
?
They are structurally identical — both have three phases mapping 1-to-1: Arrange = Given, Act = When, Assert = Then. The difference is cultural: AAA originated in xUnit testing communities and is common in Java/C++. Given-When-Then comes from BDD (Behavior-Driven Development) and is used in Cucumber/Gherkin. The underlying discipline is the same: clear separation of setup, action, and verification.

In Google Test (C++), what is the difference between EXPECT_EQ and ASSERT_EQ?
?
Both verify equality. EXPECT_EQ is a non-fatal assertion — the test continues running after failure, collecting all failures. ASSERT_EQ is fatal — the test stops immediately on failure. Use EXPECT_EQ when you want to see all failures in one run. Use ASSERT_EQ when a failure makes subsequent assertions meaningless (e.g., if a pointer is null, dereferencing it would crash).

In pytest (Python), what is a fixture and why is it important for the Independent principle?
?
A pytest fixture is a function decorated with @pytest.fixture that provides a fresh object or resource to each test. Fixtures enforce Independence by creating a new instance for every test function that requests it — tests cannot share mutable state through fixtures by default. Fixtures replace setUp/tearDown from unittest and allow dependency injection into test functions via function parameters.

What makes a test name good vs. bad?
?
A good test name describes the behavior being verified and the scenario: newPasswordWorksAfterChange, applyingCouponReducesOrderTotal. A bad test name describes the method being called: testCheckout, testProcessor. When a good-named test fails, the name tells you exactly what broke. When a bad-named test fails, you must read the test body to understand what regressed.

What is a “characterization test” and when do you use it?
?
A characterization test documents the existing (possibly incorrect) behavior of legacy code before you refactor it. Unlike normal TDD tests (which specify desired behavior), characterization tests capture what the code actually does today — even if that behavior is buggy. They protect you from accidentally changing behavior during refactoring. Once the refactoring is stable, you can update the tests to reflect correct behavior.

Why is new ConcreteClass() inside a method a testability problem?
?
When a production method directly constructs its dependencies (e.g., new SmtpEmailSender()), the test cannot inject a fake. The test is forced to use the real implementation, which may require a network, a database, or slow I/O. This violates Fast and Repeatable. The fix is dependency injection: accept the dependency through a constructor or method parameter so tests can pass a FakeEmailSender instead.

What is the relationship between clean tests and the Boy Scout Rule?
?
The Boy Scout Rule (always leave the code cleaner than you found it) applies to test code too. When you modify production code, also clean up any test code that becomes unclear, verbose, or redundant. A test suite that is continuously kept clean stays useful. A test suite that accumulates technical debt becomes a burden and is eventually abandoned.

What distinguishes a unit test from an integration test, and does FIRST apply to both?
?
A unit test tests a single unit of behavior in isolation, with all external dependencies replaced by fakes/mocks. It must satisfy all FIRST principles. An integration test tests multiple real components working together (e.g., code + database) and is inherently slower and less isolated — it is NOT Fast or fully Independent by nature. FIRST applies strictly to unit tests; integration tests have different standards.

Why is it harmful to test multiple unrelated behaviors in one test function?
?
When a test with multiple unrelated behaviors fails, you must read the entire test body to find which assertion failed and debug which behavior broke. Each test becomes a multi-failure point instead of a single indicator. More importantly, the test name cannot accurately describe multiple unrelated behaviors — so either the name lies, or it’s so generic it’s useless. One concept per test makes failures self-documenting.

What problem does the “Timely” principle solve at the design level?
?
When tests are written after production code, developers often discover the code is untestable — it directly constructs dependencies, uses global state, or has no interfaces. The developer is then faced with either rewriting the production code (expensive) or writing tests that require complex setup (dirty). Writing tests first forces the production code to be designed for testability: clean interfaces, constructor injection, no hidden dependencies.

Total Cards: 28
Review Time: ~18 minutes
Priority: HIGH
Last Updated: 2026-04-14

Study Notes by Niladri & AI

Explorer

ch09-flashcards

Chapter 9 Flashcards — Unit Tests

Graph View