Chapter 9: Unit Tests

clean-code unit-tests tdd testing

Status: Notes complete
Difficulty: Medium
Time to complete: ~40 min read


Overview

Test code is first-class code. It is not a second-class citizen that you can write sloppily and fix later. Dirty tests are equivalent to — and in some ways worse than — no tests at all: they give you false confidence while imposing a maintenance burden that eventually collapses.

The central thesis of this chapter: tests enable change. Tests are what allow you to refactor, extend, and improve production code without fear. If you lose the tests, you lose the ability to change the production code without introducing bugs. The test suite is the safety net that keeps the codebase flexible.

This chapter covers:

  • The Three Laws of TDD
  • Why test code must be kept as clean as production code
  • The BUILD-OPERATE-CHECK (Arrange-Act-Assert) pattern
  • Domain-specific testing languages
  • The dual standard: readable ≠ efficient
  • One assert per test / one concept per test
  • The F.I.R.S.T. principles

The Problem: What Bad Code Looks Like

A test suite with dirty tests is worse than no test suite in one critical way: it creates a maintenance burden. Every time production code changes, the tests must be updated. If the tests are hard to read and reason about, that update becomes a painful and time-consuming chore. Teams that start with dirty tests often end up abandoning them entirely — and once the tests are gone, the production code rots.

Here is what a dirty test looks like:

// BAD — unclear naming, multiple assertions testing unrelated things,
// verbose setup with no clear structure, magic numbers with no explanation
@Test
public void testProcessor() {
    OrderProcessor op = new OrderProcessor();
    op.setDb(new MockDatabase());
    op.setUser("u1");
    op.addItem("i1", 2);
    op.addItem("i2", 1);
    op.setDiscount(0.1);
    boolean r = op.process();
    assertTrue(r);
    assertEquals(3, op.getItemCount());
    assertEquals(0.1, op.getAppliedDiscount(), 0.001);
    assertNotNull(op.getOrderId());
    assertEquals("PENDING", op.getStatus());
    assertEquals(2, op.getItemQuantity("i1"));
}

What’s wrong here:

  • One test validates six different behaviors — when it fails, you don’t know which behavior broke
  • Setup is verbose and hides what’s actually being tested
  • Magic values (“u1”, “i1”, 0.1) have no meaning without context
  • The test name testProcessor tells you nothing about what scenario is being validated

Core Principles

1. The Three Laws of TDD

Why this rule exists: Test-Driven Development is a discipline that keeps production code and test code in sync. The three laws enforce a tight feedback loop — you never write more code than is needed to make a single failing test pass. This prevents over-engineering and ensures every line of production code exists because a test demanded it.

The three laws:

  1. You may not write production code until you have written a failing unit test.
  2. You may not write more of a unit test than is sufficient to fail (compilation failures count as failures).
  3. You may not write more production code than is sufficient to pass the currently failing test.

Java — BankAccount built test-first:

// GOOD — Step 1: Write the failing test first
@Test
public void newAccountHasZeroBalance() {
    BankAccount account = new BankAccount();
    assertEquals(BigDecimal.ZERO, account.getBalance());
}
 
// Step 2: Write minimum production code to pass
public class BankAccount {
    public BigDecimal getBalance() {
        return BigDecimal.ZERO; // only what's needed to pass
    }
}
 
// Step 3: Write next failing test
@Test
public void depositIncreasesBalance() {
    BankAccount account = new BankAccount();
    account.deposit(new BigDecimal("100.00"));
    assertEquals(new BigDecimal("100.00"), account.getBalance());
}
 
// Step 4: Extend production code to pass
public class BankAccount {
    private BigDecimal balance = BigDecimal.ZERO;
 
    public BigDecimal getBalance() { return balance; }
 
    public void deposit(BigDecimal amount) {
        balance = balance.add(amount);
    }
}

C++ equivalent (Google Test):

// GOOD
#include <gtest/gtest.h>
#include "BankAccount.h"
 
TEST(BankAccountTest, NewAccountHasZeroBalance) {
    BankAccount account;
    EXPECT_EQ(0.0, account.getBalance());
}
 
TEST(BankAccountTest, DepositIncreasesBalance) {
    BankAccount account;
    account.deposit(100.0);
    EXPECT_DOUBLE_EQ(100.0, account.getBalance());
}
// BankAccount.h — minimum to pass both tests
class BankAccount {
public:
    double getBalance() const { return balance_; }
    void deposit(double amount) { balance_ += amount; }
private:
    double balance_ = 0.0;
};

Python equivalent (pytest):

# GOOD
# test_bank_account.py
from bank_account import BankAccount
from decimal import Decimal
 
def test_new_account_has_zero_balance():
    account = BankAccount()
    assert account.get_balance() == Decimal("0.00")
 
def test_deposit_increases_balance():
    account = BankAccount()
    account.deposit(Decimal("100.00"))
    assert account.get_balance() == Decimal("100.00")
# bank_account.py — minimum to pass both tests
from decimal import Decimal
 
class BankAccount:
    def __init__(self) -> None:
        self._balance: Decimal = Decimal("0.00")
 
    def get_balance(self) -> Decimal:
        return self._balance
 
    def deposit(self, amount: Decimal) -> None:
        self._balance += amount

2. Keeping Tests Clean

Why this rule exists: Test code evolves as production code evolves. If tests are dirty, every change to production code makes updating the tests painful. Teams eventually stop updating the tests. Stale tests give false confidence — they pass but no longer verify the behavior you care about. Eventually, the tests are deleted or ignored entirely.

“Having dirty tests is equivalent to, if not worse than, having no tests.”

// BAD — verbose, unclear structure, tests multiple unrelated behaviors,
// raw assertions without business meaning
@Test
public void testCheckout() {
    ShoppingCart cart = new ShoppingCart();
    cart.db = new FakeDatabase();
    cart.userId = 42;
    cart.items = new ArrayList<>();
    Item i = new Item();
    i.id = 1; i.price = 29.99; i.qty = 2;
    cart.items.add(i);
    boolean ok = cart.checkout("PROMO10");
    assertTrue(ok);
    assertEquals(53.98, cart.getTotal(), 0.01);
    assertNotNull(cart.getOrderId());
    assertTrue(cart.getOrderId().startsWith("ORD-"));
    assertEquals(CartStatus.CHECKED_OUT, cart.getStatus());
    assertFalse(cart.getItems().isEmpty());
}
// GOOD — each test has one clear purpose, meaningful names, clear AAA structure
@Test
public void checkoutWithValidPromoCodeSucceeds() {
    // Arrange
    ShoppingCart cart = cartWithTwoWidgets();
 
    // Act
    boolean result = cart.checkout("PROMO10");
 
    // Assert
    assertTrue(result);
}
 
@Test
public void checkoutAppliesDiscountToTotal() {
    ShoppingCart cart = cartWithTwoWidgets(); // 2 × $29.99 = $59.98
    cart.checkout("PROMO10");                 // 10% off
    assertEquals(new BigDecimal("53.98"), cart.getTotal());
}
 
@Test
public void checkoutAssignsOrderId() {
    ShoppingCart cart = cartWithTwoWidgets();
    cart.checkout("PROMO10");
    assertThat(cart.getOrderId()).matches("ORD-\\d{8}");
}
 
private ShoppingCart cartWithTwoWidgets() {
    ShoppingCart cart = new ShoppingCart(new FakeDatabase(), USER_ID);
    cart.addItem(new Item(WIDGET_ID, new BigDecimal("29.99"), 2));
    return cart;
}

3. Tests Enable Change

Why this rule exists: Tests are not just verification tools — they are the mechanism that gives developers courage. With a comprehensive, passing test suite, you can refactor a class, change an algorithm, or restructure a module with confidence. Without tests, every change is a guess.

“It is unit tests that keep our code flexible, maintainable, and reusable.”

The logic chain:

  1. You have tests → you can verify that a change doesn’t break behavior
  2. You can verify behavior → you have the courage to change code
  3. You have the courage to change code → the code stays clean over time
  4. Code stays clean → the system remains maintainable

Without this chain:

  1. No tests → every change is risky
  2. Risk → developers stop refactoring
  3. No refactoring → code accumulates complexity (rot)
  4. Rot → system becomes unmaintainable

This is why the cost of dirty tests eventually exceeds the cost of no tests: dirty tests give you some confidence (false), but still impose the maintenance burden. You pay the cost without getting the benefit.


4. Clean Tests — The BUILD-OPERATE-CHECK Pattern

Why this rule exists: Clean tests follow a predictable three-part structure. When all tests follow the same structure, the reader can immediately orient themselves: where is the setup, what operation is being tested, what is expected? This is the single most impactful structural rule for test readability.

The pattern has three equivalent names, all describing the same three phases:

  • BUILD-OPERATE-CHECK (Martin’s term in Clean Code)
  • Arrange-Act-Assert (AAA — most common in Java/C++ communities)
  • Given-When-Then (BDD style, common with Cucumber/Gherkin)

Java — user cart checkout:

// GOOD — explicit AAA sections with comments
@Test
public void applyingCouponReducesOrderTotal() {
    // Arrange (BUILD)
    User user = new User("alice@example.com");
    ShoppingCart cart = new ShoppingCart(user);
    cart.addItem(new Product("Widget", new BigDecimal("50.00")), 2);
    Coupon coupon = new Coupon("SAVE20", DiscountType.PERCENTAGE, 20);
 
    // Act (OPERATE)
    cart.applyCoupon(coupon);
    Order order = cart.checkout();
 
    // Assert (CHECK)
    assertEquals(new BigDecimal("80.00"), order.getTotal());
}

C++ equivalent (Google Test):

// GOOD
TEST(ShoppingCartTest, ApplyingCouponReducesOrderTotal) {
    // Arrange
    User user{"alice@example.com"};
    ShoppingCart cart{user};
    cart.addItem(Product{"Widget", 50.0}, 2);
    Coupon coupon{"SAVE20", DiscountType::Percentage, 20};
 
    // Act
    cart.applyCoupon(coupon);
    auto order = cart.checkout();
 
    // Assert
    EXPECT_DOUBLE_EQ(80.0, order.getTotal());
}

Python equivalent (pytest):

# GOOD
from decimal import Decimal
from shopping import User, ShoppingCart, Product, Coupon, DiscountType
 
def test_applying_coupon_reduces_order_total():
    # Arrange (Given)
    user = User("alice@example.com")
    cart = ShoppingCart(user)
    cart.add_item(Product("Widget", Decimal("50.00")), quantity=2)
    coupon = Coupon("SAVE20", DiscountType.PERCENTAGE, discount=20)
 
    # Act (When)
    cart.apply_coupon(coupon)
    order = cart.checkout()
 
    # Assert (Then)
    assert order.get_total() == Decimal("80.00")

5. Domain-Specific Testing Language

Why this rule exists: When tests grow large, raw API calls become hard to read. The solution is to build a test DSL — a layer of helper functions and assertion utilities that reads like a specification written in business language rather than programming language. This is not a framework; it is a set of functions that accumulate organically as you write more tests for the same domain.

// BAD — raw API, hard to read what's being tested
@Test
public void testOrderFulfillment() {
    Order order = new Order();
    order.setCustomerId(5);
    order.addLineItem(new LineItem("SKU-001", 3, new BigDecimal("12.99")));
    order.addLineItem(new LineItem("SKU-002", 1, new BigDecimal("49.99")));
    order.setShippingAddress(new Address("123 Main St", "New York", "NY", "10001"));
    order.setPaymentMethod(new CreditCard("4111111111111111", "12/26", "123"));
    boolean result = fulfillmentService.process(order);
    assertTrue(result);
    assertEquals(OrderStatus.FULFILLED, order.getStatus());
    assertTrue(order.getTrackingNumber() != null && !order.getTrackingNumber().isEmpty());
}
// GOOD — test DSL makes the intent obvious
@Test
public void fulfilledOrderReceivesTrackingNumber() {
    Order order = anOrder()
        .forCustomer(CUSTOMER_ID)
        .withItem("SKU-001", quantity(3), priceOf("12.99"))
        .withItem("SKU-002", quantity(1), priceOf("49.99"))
        .shippingTo(NEW_YORK_ADDRESS)
        .paidBy(VALID_CREDIT_CARD)
        .build();
 
    fulfillmentService.process(order);
 
    assertOrderIsFulfilled(order);
    assertHasTrackingNumber(order);
}
 
// Test DSL helpers (build organically — don't design upfront)
private OrderBuilder anOrder() { return new OrderBuilder(); }
private int quantity(int n) { return n; }
private BigDecimal priceOf(String s) { return new BigDecimal(s); }
 
private void assertOrderIsFulfilled(Order order) {
    assertEquals(OrderStatus.FULFILLED, order.getStatus());
}
 
private void assertHasTrackingNumber(Order order) {
    assertThat(order.getTrackingNumber()).isNotBlank();
}

Python equivalent:

# GOOD — test DSL with helper factories and assertion utilities
def test_fulfilled_order_receives_tracking_number():
    order = (
        an_order()
        .for_customer(CUSTOMER_ID)
        .with_item("SKU-001", quantity=3, price=Decimal("12.99"))
        .with_item("SKU-002", quantity=1, price=Decimal("49.99"))
        .shipping_to(NEW_YORK_ADDRESS)
        .paid_by(VALID_CREDIT_CARD)
        .build()
    )
 
    fulfillment_service.process(order)
 
    assert_order_is_fulfilled(order)
    assert_has_tracking_number(order)
 
def an_order() -> "OrderBuilder":
    return OrderBuilder()
 
def assert_order_is_fulfilled(order: Order) -> None:
    assert order.status == OrderStatus.FULFILLED
 
def assert_has_tracking_number(order: Order) -> None:
    assert order.tracking_number is not None and len(order.tracking_number) > 0

6. Dual Standard

Why this rule exists: Test code runs in a test environment, not production. It does not need to be as efficient in memory or CPU usage as production code. The single most important quality of test code is readability — not runtime performance. Martin explicitly states that some things are appropriate in test code that would be inappropriate in production code.

// BAD — test is written for efficiency (StringBuilder) at the cost of readability
@Test
public void temperatureSensorReadsCorrectly() throws Exception {
    HvacController controller = new HvacController();
    TemperatureSensor sensor = new MockSensor(Arrays.asList(
        new Reading(true, false, true, false),
        new Reading(false, false, false, false),
        new Reading(true, true, false, false)
    ));
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < 3; i++) {
        controller.tick();
        sb.append(sensor.isHeaterOn() ? "H" : "h");
        sb.append(sensor.isBlowerOn() ? "B" : "b");
        sb.append(sensor.isCoolerOn() ? "C" : "c");
    }
    assertEquals("HbcHbchBc", sb.toString());
}
// GOOD — string concatenation in tests is fine; the state map is immediately readable
@Test
public void temperatureSensorReadsCorrectly() throws Exception {
    HvacController controller = new HvacController();
    MockSensor sensor = makeSensorWith(
        reading(HEATER_ON, BLOWER_OFF, COOLER_OFF),
        reading(HEATER_OFF, BLOWER_OFF, COOLER_OFF),
        reading(HEATER_ON, BLOWER_ON, COOLER_OFF)
    );
 
    assertEquals("HbcHbchBc", getStateStringAfterTicks(controller, sensor, 3));
}
 
// "HbC" notation: uppercase = on, lowercase = off; H=heater, B=blower, C=cooler
// This is immediately obvious to anyone who reads the test assertion

The dual standard means:

  • String concatenation (fine in tests, bad in production loops)
  • Large setup methods (fine in tests, violates SRP in production)
  • Hardcoded test data (fine in tests, bad in production config)
  • No dependency injection containers (fine in test construction, required in production)

7. One Assert per Test

Why this rule exists: The more assertions in a test, the harder it is to determine what broke when the test fails. A test with one assertion has one purpose — it is its own documentation. When that test fails, you know exactly which behavior regressed. The extension of this idea is “one concept per test”: even if multiple assertions are needed to verify a single concept, keep each concept in its own test function.

// BAD — three different concepts in one test; when it fails, which one broke?
@Test
public void testPasswordChange() {
    UserAccount account = new UserAccount("alice", "old-password");
 
    account.changePassword("old-password", "new-password");
 
    assertTrue(account.authenticate("new-password"));  // concept 1: new password works
    assertFalse(account.authenticate("old-password")); // concept 2: old password rejected
    assertTrue(account.getLastPasswordChange()         // concept 3: timestamp updated
               .isAfter(Instant.now().minusSeconds(5)));
}
// GOOD — three tests, each with one concept
@Test
public void newPasswordWorksAfterChange() {
    UserAccount account = new UserAccount("alice", "old-password");
    account.changePassword("old-password", "new-password");
    assertTrue(account.authenticate("new-password"));
}
 
@Test
public void oldPasswordRejectedAfterChange() {
    UserAccount account = new UserAccount("alice", "old-password");
    account.changePassword("old-password", "new-password");
    assertFalse(account.authenticate("old-password"));
}
 
@Test
public void passwordChangeUpdatesTimestamp() {
    UserAccount account = new UserAccount("alice", "old-password");
    Instant before = Instant.now();
    account.changePassword("old-password", "new-password");
    assertTrue(account.getLastPasswordChange().isAfter(before));
}

Python equivalent:

# GOOD — one concept per test function
def test_new_password_works_after_change():
    account = UserAccount("alice", "old-password")
    account.change_password("old-password", "new-password")
    assert account.authenticate("new-password")
 
def test_old_password_rejected_after_change():
    account = UserAccount("alice", "old-password")
    account.change_password("old-password", "new-password")
    assert not account.authenticate("old-password")
 
def test_password_change_updates_timestamp():
    account = UserAccount("alice", "old-password")
    before = datetime.now(tz=timezone.utc)
    account.change_password("old-password", "new-password")
    assert account.last_password_change > before

Note on pragmatism: Sometimes multiple assertions test the same concept, and splitting them into separate tests would require duplicating setup code. In that case, keep them together but use the “Template Method” pattern to factor out shared setup into a @BeforeEach or a helper factory method.


8. F.I.R.S.T. Principles of Clean Tests

Why this rule exists: A test suite that cannot be relied upon is worthless. The FIRST principles define the five qualities that make a test suite trustworthy and useful as a development tool.

Fast

Tests must run quickly. If they don’t, developers stop running them frequently. If developers don’t run the tests frequently, the tests don’t catch regressions early. Slow tests create a situation where the feedback loop is broken.

// BAD — test makes real HTTP call; takes seconds; fails in CI without network
@Test
public void emailNotificationSentOnOrderShipped() throws Exception {
    EmailService emailService = new RealSmtpEmailService("smtp.company.com", 587);
    OrderService orderService = new OrderService(emailService);
 
    orderService.shipOrder("ORD-12345");
 
    // How do we even verify this without a real mailbox?
    Thread.sleep(2000); // waiting for SMTP
    assertTrue(checkMailbox("customer@example.com", "Your order has shipped"));
}
// GOOD — test uses a fast in-memory fake; runs in milliseconds
@Test
public void emailNotificationSentOnOrderShipped() {
    FakeEmailService emailService = new FakeEmailService();
    OrderService orderService = new OrderService(emailService);
 
    orderService.shipOrder("ORD-12345");
 
    assertTrue(emailService.wasSentTo("customer@example.com"));
    assertThat(emailService.getLastSubject()).contains("Your order has shipped");
}

Independent

Tests must not depend on each other. Each test should be able to run in any order, in isolation. When tests depend on shared mutable state from a previous test, a single failure can cascade into many spurious failures, making it impossible to diagnose the real problem.

// BAD — test2 depends on state left by test1; run out of order and test2 fails
static ShoppingCart sharedCart;
 
@Test
public void test1_addItemToCart() {
    sharedCart = new ShoppingCart();
    sharedCart.addItem(new Item("Widget", new BigDecimal("9.99")));
    assertEquals(1, sharedCart.getItemCount());
}
 
@Test
public void test2_checkoutCartWithItem() {
    // Assumes sharedCart was populated by test1 — FRAGILE
    Order order = sharedCart.checkout();
    assertNotNull(order.getOrderId());
}
// GOOD — each test creates its own state; order-independent
@Test
public void addItemToCart() {
    ShoppingCart cart = new ShoppingCart();
    cart.addItem(new Item("Widget", new BigDecimal("9.99")));
    assertEquals(1, cart.getItemCount());
}
 
@Test
public void checkoutCartWithItemCreatesOrder() {
    ShoppingCart cart = new ShoppingCart();
    cart.addItem(new Item("Widget", new BigDecimal("9.99")));
    Order order = cart.checkout();
    assertNotNull(order.getOrderId());
}

Repeatable

Tests must produce the same result in every environment: developer machines, CI/CD pipelines, offline environments, different time zones. A test that fails intermittently is a test that cannot be trusted.

// BAD — depends on system clock; fails when run just before midnight; non-deterministic
@Test
public void sameDayOrderMarkedAsSameDay() {
    Order order = new Order(LocalDate.now(), LocalDate.now()); // uses real clock
    assertTrue(order.isSameDayOrder());
}
// GOOD — uses injected clock; deterministic regardless of when the test runs
@Test
public void sameDayOrderMarkedAsSameDay() {
    Clock fixedClock = Clock.fixed(
        Instant.parse("2024-03-15T10:00:00Z"), ZoneOffset.UTC);
    Order order = new Order(
        LocalDate.now(fixedClock),
        LocalDate.now(fixedClock));
    assertTrue(order.isSameDayOrder());
}

Self-Validating

Tests must have a boolean outcome: they either pass or fail. A test that requires a human to read a log file and decide whether the output “looks right” is not a test — it is a manual inspection step disguised as a test. Self-validating tests have explicit assertions that fail loudly on regression.

// BAD — no assertion; "passes" even when the behavior is completely wrong
@Test
public void testReportGeneration() throws Exception {
    ReportGenerator generator = new ReportGenerator();
    String report = generator.generateMonthlyReport(2024, 3);
    System.out.println(report); // developer manually checks output — NOT a test
}
// GOOD — explicit assertions; fails automatically when content is wrong
@Test
public void monthlyReportContainsExpectedSections() {
    ReportGenerator generator = new ReportGenerator(FIXED_CLOCK);
    String report = generator.generateMonthlyReport(2024, 3);
 
    assertThat(report).contains("Monthly Report — March 2024");
    assertThat(report).contains("Total Revenue");
    assertThat(report).contains("Total Orders");
    assertThat(report).doesNotContain("ERROR");
}

Timely

Tests should be written just before the production code they verify. If you write tests after the production code is complete, you may discover that the production code was written in a way that makes it difficult to test — tightly coupled, relying on global state, or requiring complex setup. Writing tests first forces you to design code that is testable.

// BAD — production code written first, now it's hard to test because
// it directly constructs its dependencies (untestable design)
public class InvoiceService {
    public void sendInvoice(int orderId) {
        // direct construction — can't inject a fake
        EmailSender sender = new SmtpEmailSender();
        PdfGenerator pdf = new AcrobatPdfGenerator();
        // ...
    }
}
// GOOD — written test-first, so design naturally uses dependency injection
public class InvoiceService {
    private final EmailSender emailSender;
    private final PdfGenerator pdfGenerator;
 
    public InvoiceService(EmailSender emailSender, PdfGenerator pdfGenerator) {
        this.emailSender = emailSender;
        this.pdfGenerator = pdfGenerator;
    }
 
    public void sendInvoice(int orderId) {
        // uses injected dependencies — easily testable
    }
}
 
@Test
public void sendInvoiceEmailsCustomer() {
    FakeEmailSender emailSender = new FakeEmailSender();
    FakePdfGenerator pdfGenerator = new FakePdfGenerator();
    InvoiceService service = new InvoiceService(emailSender, pdfGenerator);
 
    service.sendInvoice(ORDER_ID);
 
    assertTrue(emailSender.wasSentTo(CUSTOMER_EMAIL));
}

C++ equivalent (Google Test) — FIRST principles:

// GOOD — Independent, Repeatable, Self-Validating
class OrderServiceTest : public ::testing::Test {
protected:
    void SetUp() override {
        // Each test gets fresh state — Independent
        emailService_ = std::make_unique<FakeEmailService>();
        orderService_ = std::make_unique<OrderService>(emailService_.get());
    }
 
    std::unique_ptr<FakeEmailService> emailService_;
    std::unique_ptr<OrderService> orderService_;
};
 
TEST_F(OrderServiceTest, ShippedOrderSendsEmailNotification) {
    orderService_->shipOrder("ORD-12345");
    // Self-Validating — explicit assertion, no manual inspection
    EXPECT_TRUE(emailService_->wasSentTo("customer@example.com"));
}
 
TEST_F(OrderServiceTest, CancelledOrderDoesNotSendShipmentEmail) {
    orderService_->cancelOrder("ORD-12345");
    // Independent — not affected by ShippedOrderSendsEmailNotification
    EXPECT_FALSE(emailService_->wasSentTo("customer@example.com"));
}

Python equivalent (pytest) — FIRST principles:

# GOOD — pytest fixtures enforce Independent + Fast + Repeatable
import pytest
from unittest.mock import MagicMock
from order_service import OrderService
 
@pytest.fixture
def fake_email_service():
    """Fresh fake for each test — enforces Independent."""
    return MagicMock()
 
@pytest.fixture
def order_service(fake_email_service):
    return OrderService(email_service=fake_email_service)
 
def test_shipped_order_sends_email(order_service, fake_email_service):
    order_service.ship_order("ORD-12345")
    # Self-Validating
    fake_email_service.send.assert_called_once()
 
def test_cancelled_order_does_not_send_shipment_email(order_service, fake_email_service):
    order_service.cancel_order("ORD-12345")
    # Independent — fixture gives fresh mock; not affected by previous test
    fake_email_service.send.assert_not_called()

Comparison / Summary Table

Testing Frameworks Across Languages

LanguageFrameworkKey FeatureAssertion Style
JavaJUnit 5@Test, @BeforeEach, nested tests, parameterizedassertEquals(expected, actual)
JavaMockitoMock objects, argument captureverify(mock).method()
JavaAssertJFluent, readable assertionsassertThat(x).isEqualTo(y).isNotNull()
C++Google TestTEST(), TEST_F(), fixturesEXPECT_EQ(expected, actual)
C++Catch2BDD-style, header-only, no macrosREQUIRE(result == expected)
PythonpytestFixtures, parametrize, pluginsassert result == expected
PythonunittestTestCase class, setUp/tearDownself.assertEqual(expected, actual)

FIRST Principles Quick Reference

PrincipleViolation SymptomFix
FastTests skipped because they’re too slowUse fakes/mocks; no real I/O in unit tests
IndependentOne test failure causes others to failNo shared mutable state between tests; use @BeforeEach
RepeatableTests pass locally, fail in CIInject clocks, random seeds; no network calls
Self-ValidatingTest output requires manual inspectionAdd explicit assert / assertEquals for every expected behavior
TimelyProduction code is untestableWrite tests first; design forces testability

When to Apply / Common Exceptions

Apply these principles when:

  • Writing any test that lives in a version-controlled test suite
  • Reviewing a pull request — check if tests are clean before approving
  • Refactoring existing tests that are failing for unclear reasons

Common exceptions and nuances:

  • Integration and end-to-end tests are not unit tests. They are inherently slower (not Fast) and may share state by design. These are different beasts.
  • One assert per test is a guideline, not an absolute. If two assertions verify a single concept (e.g., both parts of a range check), keep them together. The rule is really “one concept per test.”
  • Dual standard applies only to performance, not to correctness. Test code must still be logically correct and free of bugs. A test that always passes (even when the production code is broken) is worse than no test.
  • TDD is ideal, but not always practical when working with legacy code. In that case, write characterization tests first to document existing behavior, then refactor.

Checklist

When writing or reviewing tests, verify:

  • Test name describes what behavior is being verified, not which method is being called
  • Each test verifies exactly one concept (one reason to fail)
  • Test structure follows BUILD-OPERATE-CHECK (Arrange-Act-Assert)
  • No shared mutable state between tests (each test is self-contained)
  • Test runs in milliseconds (no real I/O: no HTTP, no database, no filesystem)
  • Test produces the same result every time it’s run (no random seeds, no new Date())
  • Test has at least one assertion (no “tests” that just log output)
  • Test DSL helpers are used when test setup exceeds 5 lines
  • Mock/fake objects are used instead of real external dependencies
  • FIRST principles are satisfied: Fast, Independent, Repeatable, Self-Validating, Timely

Key Takeaways

  1. Test code is production code — it must be kept as clean, readable, and well-structured as any other code in the system.
  2. Tests enable change — without a clean test suite, refactoring is dangerous; with one, it is safe and easy.
  3. The Three Laws of TDD keep you in a tight red-green-refactor loop: failing test → minimal production code → passing test.
  4. BUILD-OPERATE-CHECK (AAA) is the universal test structure: set up the state, perform the operation, verify the outcome.
  5. One concept per test means when a test fails, you know immediately which behavior regressed — no debugging required.
  6. FIRST principles (Fast, Independent, Repeatable, Self-Validating, Timely) define the five qualities every reliable test must have.
  7. Dual standard: test code can sacrifice efficiency for readability; string concatenation, verbose builders, and large helper methods are fine in tests.
  8. Test DSLs emerge naturally as you write more tests for the same domain — they are not designed upfront but extracted when tests become repetitive.
  9. Dirty tests are worse than no tests because they impose maintenance burden while providing false confidence.
  10. Write tests just before the production code — this forces testable design and prevents the production code from becoming untestable.

  • ch08-boundaries — Learning tests for third-party APIs are a form of clean test
  • ch10-classes — SRP in classes parallels “one concept per test” in test functions
  • ch11-systems — Dependency injection is what makes production code testable (Timely principle)
  • ch17-smells-and-heuristics — T1–T9 in the smells catalog are test-specific smells
  • ch03-functions — Clean functions are what makes clean tests possible (small, one thing)

Last Updated: 2026-04-14