Chapter 6: Objects and Data Structures

clean-code objects data-structures abstraction law-of-demeter

Status: Notes complete
Difficulty: Medium
Time to complete: ~40 min read


Overview

Chapter 6 is about a dichotomy at the heart of object-oriented design: objects and data structures are fundamentally different things, and conflating them produces the worst of both worlds.

  • Objects hide their data behind abstractions and expose functions that operate on that data. You interact with an object through its behavior.
  • Data structures expose their data and have no meaningful functions. You interact with a data structure by reading its fields.

This is not a value judgment — both forms are legitimate and useful. The mistake is using one where the other is appropriate, or mixing them into a hybrid that has neither’s advantages.

Understanding this dichotomy shapes decisions about: when to use getters/setters, how to design interfaces, when to use struct vs. class, when DTOs are appropriate, and why the Law of Demeter exists.

Related chapters: ch03-functions (behavior in functions), ch10-classes (SRP and class design), ch09-unit-tests (testing objects vs. data structures).


The Problem: What Bad Code Looks Like

A common mistake is writing a class that looks like it follows OO principles (it has private fields and getters/setters) but actually exposes all of its internal structure through those accessors. This is data structure masquerading as an object.

// BAD — "object" that is really just a data structure with extra steps
public class Vehicle {
    private double fuelTankCapacityInGallons;
    private double gallonsOfGasoline;
 
    public double getFuelTankCapacityInGallons() {
        return fuelTankCapacityInGallons;
    }
 
    public double getGallonsOfGasoline() {
        return gallonsOfGasoline;
    }
}
 
// Client code forced to know the internal representation:
double percentFuel = v.getGallonsOfGasoline() / v.getFuelTankCapacityInGallons() * 100;

The Vehicle class communicates nothing useful. The client has to perform the computation that belongs inside the class. Compare with:

// GOOD — the interface hides implementation details
public interface Vehicle {
    double getPercentFuelRemaining();
}

The client only needs the abstraction. The implementation can store gallons, liters, cubic centimeters, or anything else — without the client ever knowing.


Core Principles

1. Data Abstraction

WHY: A class is not clean just because its fields are private and accessed through getters/setters. If the getters and setters directly expose the underlying representation field-by-field, you have not abstracted anything — you have just added ceremony. True abstraction means the interface communicates what the object knows, not how it stores it.

The canonical example: two ways to represent a 2D point.

// BAD — concrete point: exposes the representation directly
public class Point {
    public double x;
    public double y;
}
// or even:
public class Point {
    private double x;
    private double y;
    public double getX() { return x; }
    public double getY() { return y; }
    public void setX(double x) { this.x = x; }
    public void setY(double y) { this.y = y; }
}
// We know it's Cartesian. The implementation IS the interface.
 
// GOOD — abstract point: can be Cartesian OR polar internally
public interface Point {
    double getX();     // caller doesn't know if this is stored or computed
    double getY();
    double getR();
    double getTheta();
    void setCartesian(double x, double y);
    void setPolar(double r, double theta);
}

The second interface is not merely asking for coordinates — it is enforcing that you think in terms of coordinates. The implementation could store polar internally and compute Cartesian on demand, or vice versa. The caller has no knowledge of this choice.

Vehicle example — the fuel abstraction:

// BAD — exposes gallons, forcing client to compute percentage
public interface Vehicle {
    double getFuelTankCapacityInGallons();
    double getGallonsOfGasoline();
}
 
// GOOD — abstracts fuel level as a meaningful business concept
public interface Vehicle {
    double getPercentFuelRemaining();
}
// C++ — abstract interface via pure virtual class
class Vehicle {
public:
    virtual ~Vehicle() = default;
    virtual double getPercentFuelRemaining() const = 0;
};
 
// BAD (concrete implementation leaking through interface)
class VehicleBad {
public:
    double getFuelTankCapacityInGallons() const { return tankCapacity_; }
    double getGallonsOfGasoline() const { return currentFuel_; }
private:
    double tankCapacity_;
    double currentFuel_;
};
# Python — abstract interface via Protocol
from typing import Protocol
 
class Vehicle(Protocol):
    # GOOD: meaningful abstraction
    def get_percent_fuel_remaining(self) -> float: ...
 
# BAD: concrete representation leaking through
class VehicleBad:
    def __init__(self, tank_gallons: float, current_gallons: float) -> None:
        self._tank = tank_gallons
        self._current = current_gallons
 
    def get_fuel_tank_capacity_in_gallons(self) -> float:
        return self._tank
 
    def get_gallons_of_gasoline(self) -> float:
        return self._current
 
# Client is now forced to know internal representation — violation
class VehicleGood:
    def __init__(self, tank_gallons: float, current_gallons: float) -> None:
        self._tank = tank_gallons
        self._current = current_gallons
 
    def get_percent_fuel_remaining(self) -> float:
        return (self._current / self._tank) * 100.0

2. Data/Object Anti-Symmetry

WHY: Objects and data structures have exactly opposite strengths:

  • Objects (polymorphic behavior): easy to add new types (new subclass), hard to add new functions (must change all classes)
  • Data structures (procedural code): easy to add new functions (new procedure), hard to add new data types (must update all procedures)

Neither is universally superior. The right choice depends on whether your system is more likely to grow by adding new types or new operations.

The Shapes Example:

// Procedural approach — data structures + Geometry class
// GOOD when you frequently add new functions; BAD when you add new shapes
public class Square {
    public Point topLeft;
    public double side;
}
 
public class Rectangle {
    public Point topLeft;
    public double height;
    public double width;
}
 
public class Circle {
    public Point center;
    public double radius;
}
 
public class Geometry {
    public static final double PI = Math.PI;
 
    public double area(Object shape) {
        if (shape instanceof Square s) {
            return s.side * s.side;
        } else if (shape instanceof Rectangle r) {
            return r.height * r.width;
        } else if (shape instanceof Circle c) {
            return PI * c.radius * c.radius;
        }
        throw new NoSuchShapeException();
    }
 
    public double perimeter(Object shape) {
        if (shape instanceof Square s) return 4 * s.side;
        if (shape instanceof Rectangle r) return 2 * (r.height + r.width);
        if (shape instanceof Circle c) return 2 * PI * c.radius;
        throw new NoSuchShapeException();
    }
    // Adding a new function (e.g., diagonal()) — easy, one place
    // Adding a new shape (Triangle) — hard, touch every function in Geometry
}
// OO approach — polymorphic objects
// GOOD when you frequently add new types; BAD when you add new functions
public interface Shape {
    double area();
    double perimeter();
}
 
public class Square implements Shape {
    private final Point topLeft;
    private final double side;
    // ...
    @Override public double area() { return side * side; }
    @Override public double perimeter() { return 4 * side; }
}
 
public class Circle implements Shape {
    private final Point center;
    private final double radius;
    // ...
    @Override public double area() { return Math.PI * radius * radius; }
    @Override public double perimeter() { return 2 * Math.PI * radius; }
}
// Adding Triangle — easy: new class, no existing code changes
// Adding diagonal() to Shape — hard: must update every class
// C++ — OO approach with virtual dispatch
class Shape {
public:
    virtual ~Shape() = default;
    virtual double area() const = 0;
    virtual double perimeter() const = 0;
};
 
class Circle : public Shape {
public:
    explicit Circle(double radius) : radius_(radius) {}
    double area() const override { return M_PI * radius_ * radius_; }
    double perimeter() const override { return 2 * M_PI * radius_; }
private:
    double radius_;
};
# Python — using ABC for OO approach
from abc import ABC, abstractmethod
import math
 
class Shape(ABC):
    @abstractmethod
    def area(self) -> float: ...
 
    @abstractmethod
    def perimeter(self) -> float: ...
 
class Circle(Shape):
    def __init__(self, radius: float) -> None:
        self._radius = radius
 
    def area(self) -> float:
        return math.pi * self._radius ** 2
 
    def perimeter(self) -> float:
        return 2 * math.pi * self._radius
 
# Procedural approach — data classes + standalone functions
from dataclasses import dataclass
 
@dataclass
class CircleData:
    radius: float
 
@dataclass
class RectangleData:
    width: float
    height: float
 
def area(shape: CircleData | RectangleData) -> float:
    if isinstance(shape, CircleData):
        return math.pi * shape.radius ** 2
    elif isinstance(shape, RectangleData):
        return shape.width * shape.height
    raise TypeError(f"Unknown shape: {type(shape)}")

Decision guide:

  • Adding new shapes/types frequently → use OO (polymorphic objects)
  • Adding new operations/functions frequently → use procedural (data structures)
  • If you are not sure → start procedural; refactor to OO when you add the third type

3. The Law of Demeter

WHY: A module should not know about the innards of the objects it manipulates. Knowledge of internal structure creates tight coupling — a change to an internal object ripples through every caller that navigated to it.

Formal rule: A method f of class C may only call the methods of:

  1. C itself (any method on this)
  2. An object created by f (local object)
  3. An object passed as an argument to f
  4. An object held in an instance variable of C

The rule does NOT allow calling methods on objects returned by calling other methods (unless those objects are instances of the four cases above). In colloquial terms: talk to friends, not to strangers.

// BAD — getOptions() returns a stranger; getScratchDir() returns another stranger
public void createScratchFile(Context ctxt) {
    String scratchDirPath = ctxt.getOptions().getScratchDir().getAbsolutePath();
    // ctxt is a friend; Options is a stranger; File is a stranger's stranger
}
 
// GOOD — ask the object to do the work itself
public void createScratchFile(Context ctxt) {
    OutputStream scratchFile = ctxt.createScratchFileStream(classifier);
    // ctxt is a friend; we ask it for a stream, not a path
}

The Law of Demeter applies to objects (things with behavior). It does not apply to data structures — it is natural to navigate address.city.zipCode in a DTO.


4. Train Wrecks

WHY: A chain of method calls — each returning the next object to call — is called a train wreck. It violates the Law of Demeter when each returned value is an object (with behavior), because the caller must know the internal structure of each link in the chain.

// BAD — train wreck: caller knows the chain Options → ScratchDir → File
String outputDir = ctxt.getOptions().getScratchDir().getAbsolutePath();

Option 1: Intermediate variables (better, but not always the fix)

// Better — but still exposes that ctxt has Options, Options has a ScratchDir...
Options opts = ctxt.getOptions();
File scratchDir = opts.getScratchDir();
String outputDir = scratchDir.getAbsolutePath();

This satisfies the letter of the Law (each call is on a “friend”) but not the spirit. The caller still knows too much about the object graph.

Option 2: Push the behavior into the object (best)

// GOOD — ctxt is asked to do something useful; internal structure hidden
OutputStream scratchStream = ctxt.createScratchFileStream(classifier);

The caller knows nothing about Options, ScratchDir, or paths — only that ctxt can give it a stream to write to.

// C++ — train wreck vs. behavior in the right place
// BAD
std::string outputDir = ctxt.getOptions().getScratchDir().path();
 
// GOOD
std::ofstream stream = ctxt.createScratchFileStream(classifier);
# Python — train wreck vs. delegating to the object
# BAD
output_dir = ctxt.get_options().get_scratch_dir().get_absolute_path()
 
# GOOD
scratch_stream = ctxt.create_scratch_file_stream(classifier)

When chains ARE acceptable: method chaining on a fluent interface (builder pattern, query builder) is not a Law of Demeter violation if the chain operates on the same object (StringBuilder, QueryBuilder). The builder is the “friend” throughout.

// OK — fluent builder; all calls on the same object
String result = new StringBuilder()
    .append("Hello")
    .append(", ")
    .append(name)
    .toString();

5. Hybrids — The Worst of Both Worlds

WHY: A hybrid is a class that tries to be both an object and a data structure. It has meaningful behavior methods and public variables (or getters/setters that expose structure directly). Hybrids inherit the worst properties of each form:

  • Hard to add new functions (like an object, because you’d need to update behavior)
  • Hard to add new data types (like a data structure, because the exposed fields constrain the implementation)
// BAD — hybrid: has both meaningful behavior AND exposed internals
public class OrderItem {
    // Exposed as data structure:
    public String productId;
    public int quantity;
    public double unitPrice;
 
    // Also has meaningful behavior:
    public double calculateSubtotal() { return quantity * unitPrice; }
    public boolean isInStock() { return inventory.checkStock(productId); }
    public void applyBulkDiscount() { ... }
}

This is a design decision that has not been made. Is OrderItem a data transfer object or a domain object? It must be one or the other. Mixing them produces code that is harder to maintain than either pure form.

// GOOD — separate concerns
// Option A: Data structure (DTO)
public record OrderItemDto(String productId, int quantity, double unitPrice) {}
 
// Option B: Domain object (behavior only, internal state hidden)
public class OrderItem {
    private final String productId;
    private final int quantity;
    private final double unitPrice;
    private final InventoryService inventory;
 
    public double calculateSubtotal() { return quantity * unitPrice; }
    public boolean isInStock() { return inventory.checkStock(productId); }
}

6. Hiding Structure

WHY: If ctxt is an object, we should not ask it for its internal components (Options, ScratchDir) and then operate on those components ourselves. We should ask ctxt to do something meaningful for us.

This principle is the “fix” for train wrecks — instead of navigating into an object’s internals, we ask the object to perform the action using those internals.

How to identify the right method name: ask “what is the caller ultimately trying to do?” If the caller navigates to getAbsolutePath() to create a file, the right abstraction is createFileAt(classifier) or openOutputFile(name). The method name reflects the intent, not the mechanism.

// BAD — asking ctxt for its guts and operating on them yourself
File scratchDir = ctxt.getOptions().getScratchDir();
String outputPath = scratchDir.getAbsolutePath() + "/" + classifier + ".tmp";
FileOutputStream fos = new FileOutputStream(outputPath);
 
// GOOD — asking ctxt to do the work using its own internals
OutputStream fos = ctxt.createScratchFileStream(classifier);
// C++ equivalent
// BAD
std::filesystem::path scratchPath = ctxt.getOptions().getScratchDir() / classifier;
std::ofstream fos(scratchPath.string() + ".tmp");
 
// GOOD
std::ofstream fos = ctxt.createScratchFileStream(classifier);
# Python equivalent
# BAD
scratch_path = ctxt.get_options().get_scratch_dir() / classifier
fos = open(str(scratch_path) + ".tmp", "w")
 
# GOOD
fos = ctxt.create_scratch_file_stream(classifier)

7. Data Transfer Objects (DTOs)

WHY: Sometimes the right design is a class with no behavior — only data. This is a Data Transfer Object (DTO). DTOs are appropriate at system boundaries: reading from a database, parsing a network message, communicating between layers. They are pure data containers, and there is no pretense of OO behavior.

// GOOD — Java DTO using record (Java 16+)
public record OrderDto(
    String orderId,
    String customerId,
    List<OrderItemDto> items,
    double totalAmount,
    String status
) {}
 
// Java DTO the old way (pre-records)
public class OrderDto {
    public final String orderId;
    public final String customerId;
    public final List<OrderItemDto> items;
    public final double totalAmount;
    public final String status;
 
    public OrderDto(String orderId, String customerId,
                    List<OrderItemDto> items, double totalAmount, String status) {
        this.orderId = orderId;
        this.customerId = customerId;
        this.items = items;
        this.totalAmount = totalAmount;
        this.status = status;
    }
}
// C++ — struct is the natural DTO form
struct OrderDto {
    std::string orderId;
    std::string customerId;
    std::vector<OrderItemDto> items;
    double totalAmount;
    std::string status;
};
# Python — dataclass is the natural DTO form
from dataclasses import dataclass
from typing import List
 
@dataclass
class OrderDto:
    order_id: str
    customer_id: str
    items: List[OrderItemDto]
    total_amount: float
    status: str
 
# For immutable DTOs:
@dataclass(frozen=True)
class OrderDto:
    order_id: str
    customer_id: str
    total_amount: float
    status: str

Common DTO usage pattern: A repository reads a raw SQL row into an OrderDto, which is then translated into an Order domain object by a mapper/factory. The DTO carries data; the domain object carries behavior.

// Typical translation chain
public class OrderRepository {
    public Order findById(String orderId) {
        OrderDto raw = jdbcTemplate.queryForObject(SQL, orderDtoMapper, orderId);
        return OrderMapper.toDomain(raw);  // DTO → domain object
    }
}

8. Active Records

WHY: An Active Record is a special form of DTO that also has navigational methods: save(), find(), delete(). The Active Record pattern is common in frameworks like Ruby on Rails, Django ORM, and some Java ORMs.

The trap: developers often add business rules to Active Records because they are already modeling the domain entity. This produces a hybrid.

// BAD — business rules in an Active Record
public class Customer extends ActiveRecord {
    public String email;
    public String name;
    public CustomerTier tier;
 
    // Navigation (fine for Active Record)
    public void save() { ... }
    public static Customer findById(long id) { ... }
 
    // Business rules DO NOT belong here
    public double calculateLoyaltyDiscount(Order order) { ... }
    public boolean isEligibleForPremiumSupport() { ... }
    public void upgradeToGoldTier() { ... }
}
 
// GOOD — Active Record as data structure; business rules in a separate object
public class Customer extends ActiveRecord {
    public String email;
    public String name;
    public CustomerTier tier;
 
    public void save() { ... }
    public static Customer findById(long id) { ... }
    // No business logic here
}
 
public class CustomerPricingPolicy {
    public double calculateLoyaltyDiscount(Customer customer, Order order) { ... }
    public boolean isEligibleForPremiumSupport(Customer customer) { ... }
    public void upgradeToGoldTier(Customer customer) { customer.tier = CustomerTier.GOLD; customer.save(); }
}
# Python — Django-style Active Record with and without business logic
# BAD — Django model with business rules
class Customer(models.Model):
    email = models.CharField(max_length=255)
    name = models.CharField(max_length=255)
    tier = models.CharField(max_length=50)
 
    # Business logic — does not belong in the model
    def calculate_loyalty_discount(self, order) -> float:
        if self.tier == "GOLD":
            return 0.15
        return 0.05
 
# GOOD — model is just data + navigation; business rules in a service
class Customer(models.Model):
    email = models.CharField(max_length=255)
    name = models.CharField(max_length=255)
    tier = models.CharField(max_length=50)
    # No business logic
 
class CustomerPricingService:
    def calculate_loyalty_discount(self, customer: Customer, order) -> float:
        if customer.tier == "GOLD":
            return 0.15
        return 0.05

Comparison / Summary Table

AspectObjectsData Structures
Data visibilityHidden behind abstractionExposed (public fields or transparent getters)
FunctionsMeaningful behavior that operates on hidden stateFew or none
Adding new typesEasy — add a new class; no existing code changesHard — must update every function that switches on type
Adding new functionsHard — must add to every class in the hierarchyEasy — add a new function; no existing code changes
Best forSystems that frequently add new typesSystems that frequently add new operations
CouplingLow (caller depends on interface, not structure)Higher (caller must know fields to use them)
Law of DemeterApplies — do not navigate internal structureDoes not apply — field navigation is expected
ExampleShape.area() with polymorphic dispatchGeometry.area(Shape s) with instanceof
Java formclass with private fields + meaningful methodsrecord or POJO with public/package fields
C++ formclass with private data + virtual methodsstruct with public data members
Python formclass with _ prefixed attributes + methods@dataclass with public attributes

When to Apply / Common Exceptions

Use objects (behavior-hiding) when:

  • You are modeling domain concepts with complex invariants (BankAccount, Order, Inventory)
  • You expect to add new implementations of the same interface over time (new payment methods, new shipping carriers)
  • You want to hide complex state transitions from callers

Use data structures (DTOs) when:

  • Crossing system boundaries: database ↔ application, network ↔ application, layer ↔ layer
  • Data is simple and purely representational with no invariants
  • You need serialization/deserialization (JSON, protobuf, SQL rows)
  • You are frequently adding new operations on the same data

Use Active Records when:

  • Working within a framework that provides them (Rails, Django, some Java ORMs)
  • Keep them as pure data structures + navigation; business logic always lives in a separate domain/service layer

Common exceptions and nuance:

  • Fluent builders and query builders produce chains that look like Demeter violations but are not — the chain operates on the same object
  • Value objects (immutable objects representing a domain value like Money, Email, Address) can expose their components for equality checks and arithmetic while still being objects
  • Simple getters for primitive types on domain objects are acceptable — the Law of Demeter violation is navigating to other objects, not reading a primitive
  • Framework-required patterns: many frameworks (Spring, Hibernate, Django) require getters/setters or public fields on certain classes; pragmatism wins over purity in those cases

Checklist

  • Does each class clearly commit to being either an object (hidden data, meaningful behavior) or a data structure (exposed data, minimal behavior)?
  • Does the public interface expose what the class knows (abstraction), not how it stores it (implementation)?
  • Are there getters that expose the internal representation field-by-field? If so, is a more meaningful abstraction possible?
  • Does any method navigate more than one level deep into another object’s internals? (Train wreck check)
  • When a train wreck is identified, is there a way to push the behavior into the intermediate object?
  • Are there hybrids — classes with both meaningful behavior methods AND public fields or fully transparent getters?
  • Are DTOs used at system boundaries and nowhere else?
  • Are Active Records kept free of business logic?
  • When adding a new type vs. a new function, did you choose the form that makes the easier change easy?

Key Takeaways

  1. Objects and data structures are opposites — objects hide data behind abstractions and expose behavior; data structures expose data and have no meaningful behavior. Conflating them produces hybrids that are the worst of both.
  2. Getters and setters do not make abstraction — a class that exposes every field through getX()/setX() is a data structure with extra ceremony, not an object.
  3. True abstraction hides representationgetPercentFuelRemaining() is better than getGallons() + getTankCapacity() because it hides whether fuel is stored in gallons, liters, or as a percentage.
  4. The anti-symmetry is a trade-off to manage, not a flaw — procedural code makes it easy to add new functions; OO makes it easy to add new types. Choose deliberately based on which direction your system grows.
  5. The Law of Demeter protects against tight coupling — only talk to direct friends; never navigate into a stranger’s internals.
  6. Train wrecks signal misplaced behavior — when you chain a.getB().getC().doX(), it often means doX() belongs in a or b, not in c.
  7. Hybrids are always a design decision deferred — if you see a class with both significant behavior and exposed fields, someone did not decide what it is. Decide, and refactor.
  8. DTOs serve a specific purpose — they carry data across boundaries. They are not domain objects and should not grow business logic.
  9. Active Records are data structures, not objects — treat them as such; keep business rules in separate classes.
  10. The Law of Demeter does not apply to data structures — navigating address.city.postalCode on a DTO is fine; navigating customer.getOrder().getPayment().getGateway() on objects is a violation.

  • ch03-functions — functions as the behavior of objects; how small, focused functions enable clean object interfaces
  • ch05-formatting — where to place fields and methods in the class file
  • ch09-unit-tests — testing objects through their public interface (not their internals)
  • ch10-classes — SRP, cohesion, and class organization; the larger context for object design
  • ch17-smells-and-heuristics — G36 (Avoid Transitive Navigation), C0 (Appropriate Abstraction Level)

External:


Last Updated: 2026-04-14