Chapter 9: Unit Tests
clean-code unit-tests tdd testing
Status: Notes complete
Difficulty: Medium
Time to complete: ~40 min read
Overview
Test code is first-class code. It is not a second-class citizen that you can write sloppily and fix later. Dirty tests are equivalent to — and in some ways worse than — no tests at all: they give you false confidence while imposing a maintenance burden that eventually collapses.
The central thesis of this chapter: tests enable change. Tests are what allow you to refactor, extend, and improve production code without fear. If you lose the tests, you lose the ability to change the production code without introducing bugs. The test suite is the safety net that keeps the codebase flexible.
This chapter covers:
- The Three Laws of TDD
- Why test code must be kept as clean as production code
- The BUILD-OPERATE-CHECK (Arrange-Act-Assert) pattern
- Domain-specific testing languages
- The dual standard: readable ≠ efficient
- One assert per test / one concept per test
- The F.I.R.S.T. principles
The Problem: What Bad Code Looks Like
A test suite with dirty tests is worse than no test suite in one critical way: it creates a maintenance burden. Every time production code changes, the tests must be updated. If the tests are hard to read and reason about, that update becomes a painful and time-consuming chore. Teams that start with dirty tests often end up abandoning them entirely — and once the tests are gone, the production code rots.
Here is what a dirty test looks like:
// BAD — unclear naming, multiple assertions testing unrelated things,
// verbose setup with no clear structure, magic numbers with no explanation
@Test
public void testProcessor() {
OrderProcessor op = new OrderProcessor();
op.setDb(new MockDatabase());
op.setUser("u1");
op.addItem("i1", 2);
op.addItem("i2", 1);
op.setDiscount(0.1);
boolean r = op.process();
assertTrue(r);
assertEquals(3, op.getItemCount());
assertEquals(0.1, op.getAppliedDiscount(), 0.001);
assertNotNull(op.getOrderId());
assertEquals("PENDING", op.getStatus());
assertEquals(2, op.getItemQuantity("i1"));
}What’s wrong here:
- One test validates six different behaviors — when it fails, you don’t know which behavior broke
- Setup is verbose and hides what’s actually being tested
- Magic values (“u1”, “i1”, 0.1) have no meaning without context
- The test name
testProcessortells you nothing about what scenario is being validated
Core Principles
1. The Three Laws of TDD
Why this rule exists: Test-Driven Development is a discipline that keeps production code and test code in sync. The three laws enforce a tight feedback loop — you never write more code than is needed to make a single failing test pass. This prevents over-engineering and ensures every line of production code exists because a test demanded it.
The three laws:
- You may not write production code until you have written a failing unit test.
- You may not write more of a unit test than is sufficient to fail (compilation failures count as failures).
- You may not write more production code than is sufficient to pass the currently failing test.
Java — BankAccount built test-first:
// GOOD — Step 1: Write the failing test first
@Test
public void newAccountHasZeroBalance() {
BankAccount account = new BankAccount();
assertEquals(BigDecimal.ZERO, account.getBalance());
}
// Step 2: Write minimum production code to pass
public class BankAccount {
public BigDecimal getBalance() {
return BigDecimal.ZERO; // only what's needed to pass
}
}
// Step 3: Write next failing test
@Test
public void depositIncreasesBalance() {
BankAccount account = new BankAccount();
account.deposit(new BigDecimal("100.00"));
assertEquals(new BigDecimal("100.00"), account.getBalance());
}
// Step 4: Extend production code to pass
public class BankAccount {
private BigDecimal balance = BigDecimal.ZERO;
public BigDecimal getBalance() { return balance; }
public void deposit(BigDecimal amount) {
balance = balance.add(amount);
}
}C++ equivalent (Google Test):
// GOOD
#include <gtest/gtest.h>
#include "BankAccount.h"
TEST(BankAccountTest, NewAccountHasZeroBalance) {
BankAccount account;
EXPECT_EQ(0.0, account.getBalance());
}
TEST(BankAccountTest, DepositIncreasesBalance) {
BankAccount account;
account.deposit(100.0);
EXPECT_DOUBLE_EQ(100.0, account.getBalance());
}// BankAccount.h — minimum to pass both tests
class BankAccount {
public:
double getBalance() const { return balance_; }
void deposit(double amount) { balance_ += amount; }
private:
double balance_ = 0.0;
};Python equivalent (pytest):
# GOOD
# test_bank_account.py
from bank_account import BankAccount
from decimal import Decimal
def test_new_account_has_zero_balance():
account = BankAccount()
assert account.get_balance() == Decimal("0.00")
def test_deposit_increases_balance():
account = BankAccount()
account.deposit(Decimal("100.00"))
assert account.get_balance() == Decimal("100.00")# bank_account.py — minimum to pass both tests
from decimal import Decimal
class BankAccount:
def __init__(self) -> None:
self._balance: Decimal = Decimal("0.00")
def get_balance(self) -> Decimal:
return self._balance
def deposit(self, amount: Decimal) -> None:
self._balance += amount2. Keeping Tests Clean
Why this rule exists: Test code evolves as production code evolves. If tests are dirty, every change to production code makes updating the tests painful. Teams eventually stop updating the tests. Stale tests give false confidence — they pass but no longer verify the behavior you care about. Eventually, the tests are deleted or ignored entirely.
“Having dirty tests is equivalent to, if not worse than, having no tests.”
// BAD — verbose, unclear structure, tests multiple unrelated behaviors,
// raw assertions without business meaning
@Test
public void testCheckout() {
ShoppingCart cart = new ShoppingCart();
cart.db = new FakeDatabase();
cart.userId = 42;
cart.items = new ArrayList<>();
Item i = new Item();
i.id = 1; i.price = 29.99; i.qty = 2;
cart.items.add(i);
boolean ok = cart.checkout("PROMO10");
assertTrue(ok);
assertEquals(53.98, cart.getTotal(), 0.01);
assertNotNull(cart.getOrderId());
assertTrue(cart.getOrderId().startsWith("ORD-"));
assertEquals(CartStatus.CHECKED_OUT, cart.getStatus());
assertFalse(cart.getItems().isEmpty());
}// GOOD — each test has one clear purpose, meaningful names, clear AAA structure
@Test
public void checkoutWithValidPromoCodeSucceeds() {
// Arrange
ShoppingCart cart = cartWithTwoWidgets();
// Act
boolean result = cart.checkout("PROMO10");
// Assert
assertTrue(result);
}
@Test
public void checkoutAppliesDiscountToTotal() {
ShoppingCart cart = cartWithTwoWidgets(); // 2 × $29.99 = $59.98
cart.checkout("PROMO10"); // 10% off
assertEquals(new BigDecimal("53.98"), cart.getTotal());
}
@Test
public void checkoutAssignsOrderId() {
ShoppingCart cart = cartWithTwoWidgets();
cart.checkout("PROMO10");
assertThat(cart.getOrderId()).matches("ORD-\\d{8}");
}
private ShoppingCart cartWithTwoWidgets() {
ShoppingCart cart = new ShoppingCart(new FakeDatabase(), USER_ID);
cart.addItem(new Item(WIDGET_ID, new BigDecimal("29.99"), 2));
return cart;
}3. Tests Enable Change
Why this rule exists: Tests are not just verification tools — they are the mechanism that gives developers courage. With a comprehensive, passing test suite, you can refactor a class, change an algorithm, or restructure a module with confidence. Without tests, every change is a guess.
“It is unit tests that keep our code flexible, maintainable, and reusable.”
The logic chain:
- You have tests → you can verify that a change doesn’t break behavior
- You can verify behavior → you have the courage to change code
- You have the courage to change code → the code stays clean over time
- Code stays clean → the system remains maintainable
Without this chain:
- No tests → every change is risky
- Risk → developers stop refactoring
- No refactoring → code accumulates complexity (rot)
- Rot → system becomes unmaintainable
This is why the cost of dirty tests eventually exceeds the cost of no tests: dirty tests give you some confidence (false), but still impose the maintenance burden. You pay the cost without getting the benefit.
4. Clean Tests — The BUILD-OPERATE-CHECK Pattern
Why this rule exists: Clean tests follow a predictable three-part structure. When all tests follow the same structure, the reader can immediately orient themselves: where is the setup, what operation is being tested, what is expected? This is the single most impactful structural rule for test readability.
The pattern has three equivalent names, all describing the same three phases:
- BUILD-OPERATE-CHECK (Martin’s term in Clean Code)
- Arrange-Act-Assert (AAA — most common in Java/C++ communities)
- Given-When-Then (BDD style, common with Cucumber/Gherkin)
Java — user cart checkout:
// GOOD — explicit AAA sections with comments
@Test
public void applyingCouponReducesOrderTotal() {
// Arrange (BUILD)
User user = new User("alice@example.com");
ShoppingCart cart = new ShoppingCart(user);
cart.addItem(new Product("Widget", new BigDecimal("50.00")), 2);
Coupon coupon = new Coupon("SAVE20", DiscountType.PERCENTAGE, 20);
// Act (OPERATE)
cart.applyCoupon(coupon);
Order order = cart.checkout();
// Assert (CHECK)
assertEquals(new BigDecimal("80.00"), order.getTotal());
}C++ equivalent (Google Test):
// GOOD
TEST(ShoppingCartTest, ApplyingCouponReducesOrderTotal) {
// Arrange
User user{"alice@example.com"};
ShoppingCart cart{user};
cart.addItem(Product{"Widget", 50.0}, 2);
Coupon coupon{"SAVE20", DiscountType::Percentage, 20};
// Act
cart.applyCoupon(coupon);
auto order = cart.checkout();
// Assert
EXPECT_DOUBLE_EQ(80.0, order.getTotal());
}Python equivalent (pytest):
# GOOD
from decimal import Decimal
from shopping import User, ShoppingCart, Product, Coupon, DiscountType
def test_applying_coupon_reduces_order_total():
# Arrange (Given)
user = User("alice@example.com")
cart = ShoppingCart(user)
cart.add_item(Product("Widget", Decimal("50.00")), quantity=2)
coupon = Coupon("SAVE20", DiscountType.PERCENTAGE, discount=20)
# Act (When)
cart.apply_coupon(coupon)
order = cart.checkout()
# Assert (Then)
assert order.get_total() == Decimal("80.00")5. Domain-Specific Testing Language
Why this rule exists: When tests grow large, raw API calls become hard to read. The solution is to build a test DSL — a layer of helper functions and assertion utilities that reads like a specification written in business language rather than programming language. This is not a framework; it is a set of functions that accumulate organically as you write more tests for the same domain.
// BAD — raw API, hard to read what's being tested
@Test
public void testOrderFulfillment() {
Order order = new Order();
order.setCustomerId(5);
order.addLineItem(new LineItem("SKU-001", 3, new BigDecimal("12.99")));
order.addLineItem(new LineItem("SKU-002", 1, new BigDecimal("49.99")));
order.setShippingAddress(new Address("123 Main St", "New York", "NY", "10001"));
order.setPaymentMethod(new CreditCard("4111111111111111", "12/26", "123"));
boolean result = fulfillmentService.process(order);
assertTrue(result);
assertEquals(OrderStatus.FULFILLED, order.getStatus());
assertTrue(order.getTrackingNumber() != null && !order.getTrackingNumber().isEmpty());
}// GOOD — test DSL makes the intent obvious
@Test
public void fulfilledOrderReceivesTrackingNumber() {
Order order = anOrder()
.forCustomer(CUSTOMER_ID)
.withItem("SKU-001", quantity(3), priceOf("12.99"))
.withItem("SKU-002", quantity(1), priceOf("49.99"))
.shippingTo(NEW_YORK_ADDRESS)
.paidBy(VALID_CREDIT_CARD)
.build();
fulfillmentService.process(order);
assertOrderIsFulfilled(order);
assertHasTrackingNumber(order);
}
// Test DSL helpers (build organically — don't design upfront)
private OrderBuilder anOrder() { return new OrderBuilder(); }
private int quantity(int n) { return n; }
private BigDecimal priceOf(String s) { return new BigDecimal(s); }
private void assertOrderIsFulfilled(Order order) {
assertEquals(OrderStatus.FULFILLED, order.getStatus());
}
private void assertHasTrackingNumber(Order order) {
assertThat(order.getTrackingNumber()).isNotBlank();
}Python equivalent:
# GOOD — test DSL with helper factories and assertion utilities
def test_fulfilled_order_receives_tracking_number():
order = (
an_order()
.for_customer(CUSTOMER_ID)
.with_item("SKU-001", quantity=3, price=Decimal("12.99"))
.with_item("SKU-002", quantity=1, price=Decimal("49.99"))
.shipping_to(NEW_YORK_ADDRESS)
.paid_by(VALID_CREDIT_CARD)
.build()
)
fulfillment_service.process(order)
assert_order_is_fulfilled(order)
assert_has_tracking_number(order)
def an_order() -> "OrderBuilder":
return OrderBuilder()
def assert_order_is_fulfilled(order: Order) -> None:
assert order.status == OrderStatus.FULFILLED
def assert_has_tracking_number(order: Order) -> None:
assert order.tracking_number is not None and len(order.tracking_number) > 06. Dual Standard
Why this rule exists: Test code runs in a test environment, not production. It does not need to be as efficient in memory or CPU usage as production code. The single most important quality of test code is readability — not runtime performance. Martin explicitly states that some things are appropriate in test code that would be inappropriate in production code.
// BAD — test is written for efficiency (StringBuilder) at the cost of readability
@Test
public void temperatureSensorReadsCorrectly() throws Exception {
HvacController controller = new HvacController();
TemperatureSensor sensor = new MockSensor(Arrays.asList(
new Reading(true, false, true, false),
new Reading(false, false, false, false),
new Reading(true, true, false, false)
));
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 3; i++) {
controller.tick();
sb.append(sensor.isHeaterOn() ? "H" : "h");
sb.append(sensor.isBlowerOn() ? "B" : "b");
sb.append(sensor.isCoolerOn() ? "C" : "c");
}
assertEquals("HbcHbchBc", sb.toString());
}// GOOD — string concatenation in tests is fine; the state map is immediately readable
@Test
public void temperatureSensorReadsCorrectly() throws Exception {
HvacController controller = new HvacController();
MockSensor sensor = makeSensorWith(
reading(HEATER_ON, BLOWER_OFF, COOLER_OFF),
reading(HEATER_OFF, BLOWER_OFF, COOLER_OFF),
reading(HEATER_ON, BLOWER_ON, COOLER_OFF)
);
assertEquals("HbcHbchBc", getStateStringAfterTicks(controller, sensor, 3));
}
// "HbC" notation: uppercase = on, lowercase = off; H=heater, B=blower, C=cooler
// This is immediately obvious to anyone who reads the test assertionThe dual standard means:
- String concatenation (fine in tests, bad in production loops)
- Large setup methods (fine in tests, violates SRP in production)
- Hardcoded test data (fine in tests, bad in production config)
- No dependency injection containers (fine in test construction, required in production)
7. One Assert per Test
Why this rule exists: The more assertions in a test, the harder it is to determine what broke when the test fails. A test with one assertion has one purpose — it is its own documentation. When that test fails, you know exactly which behavior regressed. The extension of this idea is “one concept per test”: even if multiple assertions are needed to verify a single concept, keep each concept in its own test function.
// BAD — three different concepts in one test; when it fails, which one broke?
@Test
public void testPasswordChange() {
UserAccount account = new UserAccount("alice", "old-password");
account.changePassword("old-password", "new-password");
assertTrue(account.authenticate("new-password")); // concept 1: new password works
assertFalse(account.authenticate("old-password")); // concept 2: old password rejected
assertTrue(account.getLastPasswordChange() // concept 3: timestamp updated
.isAfter(Instant.now().minusSeconds(5)));
}// GOOD — three tests, each with one concept
@Test
public void newPasswordWorksAfterChange() {
UserAccount account = new UserAccount("alice", "old-password");
account.changePassword("old-password", "new-password");
assertTrue(account.authenticate("new-password"));
}
@Test
public void oldPasswordRejectedAfterChange() {
UserAccount account = new UserAccount("alice", "old-password");
account.changePassword("old-password", "new-password");
assertFalse(account.authenticate("old-password"));
}
@Test
public void passwordChangeUpdatesTimestamp() {
UserAccount account = new UserAccount("alice", "old-password");
Instant before = Instant.now();
account.changePassword("old-password", "new-password");
assertTrue(account.getLastPasswordChange().isAfter(before));
}Python equivalent:
# GOOD — one concept per test function
def test_new_password_works_after_change():
account = UserAccount("alice", "old-password")
account.change_password("old-password", "new-password")
assert account.authenticate("new-password")
def test_old_password_rejected_after_change():
account = UserAccount("alice", "old-password")
account.change_password("old-password", "new-password")
assert not account.authenticate("old-password")
def test_password_change_updates_timestamp():
account = UserAccount("alice", "old-password")
before = datetime.now(tz=timezone.utc)
account.change_password("old-password", "new-password")
assert account.last_password_change > beforeNote on pragmatism: Sometimes multiple assertions test the same concept, and splitting them into separate tests would require duplicating setup code. In that case, keep them together but use the “Template Method” pattern to factor out shared setup into a @BeforeEach or a helper factory method.
8. F.I.R.S.T. Principles of Clean Tests
Why this rule exists: A test suite that cannot be relied upon is worthless. The FIRST principles define the five qualities that make a test suite trustworthy and useful as a development tool.
Fast
Tests must run quickly. If they don’t, developers stop running them frequently. If developers don’t run the tests frequently, the tests don’t catch regressions early. Slow tests create a situation where the feedback loop is broken.
// BAD — test makes real HTTP call; takes seconds; fails in CI without network
@Test
public void emailNotificationSentOnOrderShipped() throws Exception {
EmailService emailService = new RealSmtpEmailService("smtp.company.com", 587);
OrderService orderService = new OrderService(emailService);
orderService.shipOrder("ORD-12345");
// How do we even verify this without a real mailbox?
Thread.sleep(2000); // waiting for SMTP
assertTrue(checkMailbox("customer@example.com", "Your order has shipped"));
}// GOOD — test uses a fast in-memory fake; runs in milliseconds
@Test
public void emailNotificationSentOnOrderShipped() {
FakeEmailService emailService = new FakeEmailService();
OrderService orderService = new OrderService(emailService);
orderService.shipOrder("ORD-12345");
assertTrue(emailService.wasSentTo("customer@example.com"));
assertThat(emailService.getLastSubject()).contains("Your order has shipped");
}Independent
Tests must not depend on each other. Each test should be able to run in any order, in isolation. When tests depend on shared mutable state from a previous test, a single failure can cascade into many spurious failures, making it impossible to diagnose the real problem.
// BAD — test2 depends on state left by test1; run out of order and test2 fails
static ShoppingCart sharedCart;
@Test
public void test1_addItemToCart() {
sharedCart = new ShoppingCart();
sharedCart.addItem(new Item("Widget", new BigDecimal("9.99")));
assertEquals(1, sharedCart.getItemCount());
}
@Test
public void test2_checkoutCartWithItem() {
// Assumes sharedCart was populated by test1 — FRAGILE
Order order = sharedCart.checkout();
assertNotNull(order.getOrderId());
}// GOOD — each test creates its own state; order-independent
@Test
public void addItemToCart() {
ShoppingCart cart = new ShoppingCart();
cart.addItem(new Item("Widget", new BigDecimal("9.99")));
assertEquals(1, cart.getItemCount());
}
@Test
public void checkoutCartWithItemCreatesOrder() {
ShoppingCart cart = new ShoppingCart();
cart.addItem(new Item("Widget", new BigDecimal("9.99")));
Order order = cart.checkout();
assertNotNull(order.getOrderId());
}Repeatable
Tests must produce the same result in every environment: developer machines, CI/CD pipelines, offline environments, different time zones. A test that fails intermittently is a test that cannot be trusted.
// BAD — depends on system clock; fails when run just before midnight; non-deterministic
@Test
public void sameDayOrderMarkedAsSameDay() {
Order order = new Order(LocalDate.now(), LocalDate.now()); // uses real clock
assertTrue(order.isSameDayOrder());
}// GOOD — uses injected clock; deterministic regardless of when the test runs
@Test
public void sameDayOrderMarkedAsSameDay() {
Clock fixedClock = Clock.fixed(
Instant.parse("2024-03-15T10:00:00Z"), ZoneOffset.UTC);
Order order = new Order(
LocalDate.now(fixedClock),
LocalDate.now(fixedClock));
assertTrue(order.isSameDayOrder());
}Self-Validating
Tests must have a boolean outcome: they either pass or fail. A test that requires a human to read a log file and decide whether the output “looks right” is not a test — it is a manual inspection step disguised as a test. Self-validating tests have explicit assertions that fail loudly on regression.
// BAD — no assertion; "passes" even when the behavior is completely wrong
@Test
public void testReportGeneration() throws Exception {
ReportGenerator generator = new ReportGenerator();
String report = generator.generateMonthlyReport(2024, 3);
System.out.println(report); // developer manually checks output — NOT a test
}// GOOD — explicit assertions; fails automatically when content is wrong
@Test
public void monthlyReportContainsExpectedSections() {
ReportGenerator generator = new ReportGenerator(FIXED_CLOCK);
String report = generator.generateMonthlyReport(2024, 3);
assertThat(report).contains("Monthly Report — March 2024");
assertThat(report).contains("Total Revenue");
assertThat(report).contains("Total Orders");
assertThat(report).doesNotContain("ERROR");
}Timely
Tests should be written just before the production code they verify. If you write tests after the production code is complete, you may discover that the production code was written in a way that makes it difficult to test — tightly coupled, relying on global state, or requiring complex setup. Writing tests first forces you to design code that is testable.
// BAD — production code written first, now it's hard to test because
// it directly constructs its dependencies (untestable design)
public class InvoiceService {
public void sendInvoice(int orderId) {
// direct construction — can't inject a fake
EmailSender sender = new SmtpEmailSender();
PdfGenerator pdf = new AcrobatPdfGenerator();
// ...
}
}// GOOD — written test-first, so design naturally uses dependency injection
public class InvoiceService {
private final EmailSender emailSender;
private final PdfGenerator pdfGenerator;
public InvoiceService(EmailSender emailSender, PdfGenerator pdfGenerator) {
this.emailSender = emailSender;
this.pdfGenerator = pdfGenerator;
}
public void sendInvoice(int orderId) {
// uses injected dependencies — easily testable
}
}
@Test
public void sendInvoiceEmailsCustomer() {
FakeEmailSender emailSender = new FakeEmailSender();
FakePdfGenerator pdfGenerator = new FakePdfGenerator();
InvoiceService service = new InvoiceService(emailSender, pdfGenerator);
service.sendInvoice(ORDER_ID);
assertTrue(emailSender.wasSentTo(CUSTOMER_EMAIL));
}C++ equivalent (Google Test) — FIRST principles:
// GOOD — Independent, Repeatable, Self-Validating
class OrderServiceTest : public ::testing::Test {
protected:
void SetUp() override {
// Each test gets fresh state — Independent
emailService_ = std::make_unique<FakeEmailService>();
orderService_ = std::make_unique<OrderService>(emailService_.get());
}
std::unique_ptr<FakeEmailService> emailService_;
std::unique_ptr<OrderService> orderService_;
};
TEST_F(OrderServiceTest, ShippedOrderSendsEmailNotification) {
orderService_->shipOrder("ORD-12345");
// Self-Validating — explicit assertion, no manual inspection
EXPECT_TRUE(emailService_->wasSentTo("customer@example.com"));
}
TEST_F(OrderServiceTest, CancelledOrderDoesNotSendShipmentEmail) {
orderService_->cancelOrder("ORD-12345");
// Independent — not affected by ShippedOrderSendsEmailNotification
EXPECT_FALSE(emailService_->wasSentTo("customer@example.com"));
}Python equivalent (pytest) — FIRST principles:
# GOOD — pytest fixtures enforce Independent + Fast + Repeatable
import pytest
from unittest.mock import MagicMock
from order_service import OrderService
@pytest.fixture
def fake_email_service():
"""Fresh fake for each test — enforces Independent."""
return MagicMock()
@pytest.fixture
def order_service(fake_email_service):
return OrderService(email_service=fake_email_service)
def test_shipped_order_sends_email(order_service, fake_email_service):
order_service.ship_order("ORD-12345")
# Self-Validating
fake_email_service.send.assert_called_once()
def test_cancelled_order_does_not_send_shipment_email(order_service, fake_email_service):
order_service.cancel_order("ORD-12345")
# Independent — fixture gives fresh mock; not affected by previous test
fake_email_service.send.assert_not_called()Comparison / Summary Table
Testing Frameworks Across Languages
| Language | Framework | Key Feature | Assertion Style |
|---|---|---|---|
| Java | JUnit 5 | @Test, @BeforeEach, nested tests, parameterized | assertEquals(expected, actual) |
| Java | Mockito | Mock objects, argument capture | verify(mock).method() |
| Java | AssertJ | Fluent, readable assertions | assertThat(x).isEqualTo(y).isNotNull() |
| C++ | Google Test | TEST(), TEST_F(), fixtures | EXPECT_EQ(expected, actual) |
| C++ | Catch2 | BDD-style, header-only, no macros | REQUIRE(result == expected) |
| Python | pytest | Fixtures, parametrize, plugins | assert result == expected |
| Python | unittest | TestCase class, setUp/tearDown | self.assertEqual(expected, actual) |
FIRST Principles Quick Reference
| Principle | Violation Symptom | Fix |
|---|---|---|
| Fast | Tests skipped because they’re too slow | Use fakes/mocks; no real I/O in unit tests |
| Independent | One test failure causes others to fail | No shared mutable state between tests; use @BeforeEach |
| Repeatable | Tests pass locally, fail in CI | Inject clocks, random seeds; no network calls |
| Self-Validating | Test output requires manual inspection | Add explicit assert / assertEquals for every expected behavior |
| Timely | Production code is untestable | Write tests first; design forces testability |
When to Apply / Common Exceptions
Apply these principles when:
- Writing any test that lives in a version-controlled test suite
- Reviewing a pull request — check if tests are clean before approving
- Refactoring existing tests that are failing for unclear reasons
Common exceptions and nuances:
- Integration and end-to-end tests are not unit tests. They are inherently slower (not Fast) and may share state by design. These are different beasts.
- One assert per test is a guideline, not an absolute. If two assertions verify a single concept (e.g., both parts of a range check), keep them together. The rule is really “one concept per test.”
- Dual standard applies only to performance, not to correctness. Test code must still be logically correct and free of bugs. A test that always passes (even when the production code is broken) is worse than no test.
- TDD is ideal, but not always practical when working with legacy code. In that case, write characterization tests first to document existing behavior, then refactor.
Checklist
When writing or reviewing tests, verify:
- Test name describes what behavior is being verified, not which method is being called
- Each test verifies exactly one concept (one reason to fail)
- Test structure follows BUILD-OPERATE-CHECK (Arrange-Act-Assert)
- No shared mutable state between tests (each test is self-contained)
- Test runs in milliseconds (no real I/O: no HTTP, no database, no filesystem)
- Test produces the same result every time it’s run (no random seeds, no
new Date()) - Test has at least one assertion (no “tests” that just log output)
- Test DSL helpers are used when test setup exceeds 5 lines
- Mock/fake objects are used instead of real external dependencies
- FIRST principles are satisfied: Fast, Independent, Repeatable, Self-Validating, Timely
Key Takeaways
- Test code is production code — it must be kept as clean, readable, and well-structured as any other code in the system.
- Tests enable change — without a clean test suite, refactoring is dangerous; with one, it is safe and easy.
- The Three Laws of TDD keep you in a tight red-green-refactor loop: failing test → minimal production code → passing test.
- BUILD-OPERATE-CHECK (AAA) is the universal test structure: set up the state, perform the operation, verify the outcome.
- One concept per test means when a test fails, you know immediately which behavior regressed — no debugging required.
- FIRST principles (Fast, Independent, Repeatable, Self-Validating, Timely) define the five qualities every reliable test must have.
- Dual standard: test code can sacrifice efficiency for readability; string concatenation, verbose builders, and large helper methods are fine in tests.
- Test DSLs emerge naturally as you write more tests for the same domain — they are not designed upfront but extracted when tests become repetitive.
- Dirty tests are worse than no tests because they impose maintenance burden while providing false confidence.
- Write tests just before the production code — this forces testable design and prevents the production code from becoming untestable.
Related Resources
- ch08-boundaries — Learning tests for third-party APIs are a form of clean test
- ch10-classes — SRP in classes parallels “one concept per test” in test functions
- ch11-systems — Dependency injection is what makes production code testable (Timely principle)
- ch17-smells-and-heuristics — T1–T9 in the smells catalog are test-specific smells
- ch03-functions — Clean functions are what makes clean tests possible (small, one thing)
Last Updated: 2026-04-14