Chapter 18 Flashcards — Build Systems and Build Philosophy
flashcards seg build-systems bazel artifact-based hermeticity distributed-builds
What is the primary purpose of a build system?
?
A build system takes source code and other inputs and produces deployable artifacts (binaries, libraries, packages, container images). Its core job is to manage the dependency graph between source files and outputs — determining the correct order of operations, rebuilding only what is necessary after a change, using stable versions of inputs, and detecting cycles. At scale, the problems a build system solves are fundamentally about correctness and reproducibility, not merely performance.
What are the two fundamental problems with shell-script-based builds?
?
- No dependency tracking: every build step runs on every invocation regardless of what changed — build time grows with project size, not scope of changes.
- No reproducibility: the script depends on whatever tools happen to be installed on the current machine; builds differ between machines and over time. This makes CI unreliable and makes “it works on my machine” meaningless.
What are the key limitations of Make that make it unsuitable at large scale?
?
- Timestamp-based change detection: fragile — checkout gives files new timestamps; network filesystems have inconsistent timestamps.
- Manual dependency declarations: engineers must keep file dependency lists consistent with actual code; this maintenance burden grows with codebase size and is error-prone.
- Global state sensitivity: Makefile rules can read environment variables and invoke arbitrary programs, breaking hermeticity.
- Graph evaluation cost: Make must evaluate the full dependency graph on every invocation, which is itself slow for large projects.
What is the fundamental distinction between task-based and artifact-based build systems?
?
Task-based: the unit of work is a task (arbitrary shell command). Engineers define what commands to run and when. The build system executes them but cannot understand their inputs/outputs without running them. Examples: Make, Ant, Gradle.
Artifact-based: the unit of work is an artifact (named output). Engineers declare what to produce and what inputs it depends on; the system determines how to produce it and enforces that only declared inputs are accessible. Examples: Bazel, Pants, Buck.
Why does task opacity make task-based systems fail at scale?
?
Because tasks can execute arbitrary code, the build system cannot know — without executing a task — what files it reads, what files it writes, or whether its output is deterministic. This forces a choice between: (a) serial execution in declared order (correct but slow, no parallelism), (b) trusting unverified engineer declarations (fast but incorrect if declarations drift), or (c) rebuilding everything every time (correct but wasteful). At scale, none of these choices is acceptable.
What four properties does an artifact-based system gain by restricting build rules to a controlled DSL?
?
- Safe parallelism: the system can prove which actions are independent (no shared inputs/outputs) and run them concurrently — safely, not optimistically.
- Reliable caching: outputs are content-hash-keyed on declared inputs; a cache hit is guaranteed correct.
- Distributed execution: the same properties that enable caching naturally extend to remote build farms.
- Incremental correctness: when a file changes, the system can prove exactly which targets are affected and rebuild only those.
What is a hermetic build?
?
A build is hermetic if: (1) all inputs — source files, dependencies, tools — are fully declared; (2) the build action cannot access anything not in its declared inputs (enforced by sandboxing); and (3) the same inputs always produce the same outputs, on any machine, at any time. Hermeticity is the property that makes caching trustworthy and distributed execution correct.
What is the “environment leakage” anti-pattern in builds?
?
Build actions that read from $HOME, $PATH, ambient environment variables, or local tool installations create invisible dependencies on the engineer’s machine state. These dependencies are not declared, so they break hermeticity silently. The build appears to work but produces non-reproducible results. This is the root cause of “it works on my machine” failures in CI.
How does Bazel enforce hermeticity?
?
Through sandboxing: each build action runs in an isolated environment (filesystem namespace, network namespace, or container) that can only see files listed as inputs. Any attempt to read an undeclared file results in a build error — not a silent incorrect result. Tool versions are also hermetically specified: Bazel’s toolchain declarations pin the exact tools used, preventing ambient tool version differences from affecting output.
What are the two mechanisms that enable distributed builds, and how do they relate to hermeticity?
?
- Remote caching: every action is identified by a content hash of its inputs; outputs are stored in a remote cache keyed by that hash. Any machine — any engineer, any CI worker — that runs the same inputs can retrieve the cached output instead of recomputing.
- Remote execution: action descriptions are sent to a build farm; workers execute them in parallel; results are returned to the client.
Both depend on hermeticity: if actions could read from the local environment, remote execution would produce different outputs than local execution, and cached results from one machine could not be trusted on another.
When do distributed builds become necessary rather than just beneficial?
?
When build times become long enough that CI cannot provide timely feedback — typically at very large scale (100M+ LOC). At smaller scales, distributed builds are beneficial but not required. The threshold is CI cycle time: if engineers must wait hours for CI results, the feedback loop breaks down. Distributed builds bring CI times back into the range where they provide useful signal (minutes, not hours). The authors note that at Google’s scale, building the full codebase locally would take days — making distributed builds a feasibility requirement, not a performance optimization.
What is the 1:1:1 rule and what philosophy does it encode?
?
Google’s convention that each directory should contain: 1 purpose (single, clearly defined responsibility), 1 build target (one entry in the BUILD file), and 1 package (one importable module). It encodes the philosophy of explicit, fine-grained dependencies: with small, single-purpose targets, the build system can prove exactly what depends on what, parallelize at maximum granularity, and scope rebuilds precisely. Coarse-grained targets force rebuilding everything that shares a boundary even when only one small part changed.
What are the benefits of fine-grained modules beyond build performance?
?
- Explicit dependency graph: all dependencies must be declared; tooling can answer “what depends on this?” and “what does this depend on?” precisely.
- Unambiguous ownership: a single-purpose module has a natural owner; multi-purpose modules create unclear ownership.
- Enforced API boundaries: callers can only use exported interfaces; internal details cannot be accidentally imported.
- Easier refactoring: small, well-bounded modules can be moved, renamed, or replaced with fewer callers to update.
What is the “visibility creep” anti-pattern in build system design?
?
Incrementally widening a module’s visibility attribute to satisfy each new caller, rather than reconsidering whether the module boundary is correct. Each widening makes the module harder to change because more callers must be accounted for. Modules with broad visibility attract callers; many callers make refactoring expensive. The discipline of minimal visibility — exposing only what must be exposed — keeps modules changeable over time.
What is the single-version rule for external dependencies, and why does Google enforce it?
?
Google enforces that only one version of any external dependency may exist across the entire monorepo at once. All teams use the same version. Benefits: (1) eliminates diamond dependency conflicts (A depends on libX v1, B depends on libX v2, C depends on both A and B — impossible under single-version rule); (2) security patches are applied everywhere simultaneously — no part of the codebase can be left on an unpatched version; (3) dependency graph is unambiguous. Trade-off: a team that wants to upgrade is potentially blocked by another team’s incompatibility.
What is the “transitive dependency opacity” anti-pattern?
?
Depending on a library’s transitive dependencies without declaring them explicitly. If your code uses library B, and you get B only through A’s dependency on B, you have an undeclared dependency. When A later drops its dependency on B (for its own reasons), your build silently breaks. Artifact-based systems like Bazel detect and prevent this: if you use B, you must declare B as a direct dependency regardless of whether A also declares it.
How does the trade-off between task-based and artifact-based systems shift at scale?
?
| Scale | Task-Based | Artifact-Based |
|---|---|---|
| Small | Sufficient; less setup overhead | Overkill; verbose BUILD files |
| Medium | Caching fragility starts to hurt | Caching and incrementality pay off |
| Large | Parallel execution unsafe; caching unreliable | Required for correctness and CI feasibility |
| The transition is driven by correctness requirements, not just performance. At large scale, task-based systems produce incorrect results (stale caches, missed rebuilds) that are more dangerous than simply being slow. |
Why does the chapter say “correctness in builds means reproducibility, not just passing tests”?
?
A build can pass all tests while still being incorrect if it produces non-reproducible outputs due to hermeticity violations. For example: a binary that was compiled with a local tool version that CI does not have, or a cached output that was built with different inputs than declared. These “correct-seeming” builds create security risks (cannot reproduce a binary for incident response), debugging nightmares (different behavior on different machines), and CI reliability problems. Reproducibility is the stronger correctness requirement.
Why is build system design classified as an early investment with compounding returns?
?
Because migrating from a legacy task-based build system to an artifact-based one is extremely painful: every BUILD file must be rewritten, every task must be translated into a declared artifact rule, and hermeticity violations (which were invisible before) must be discovered and fixed. Teams that adopt artifact-based discipline early (when the codebase is small) avoid this migration entirely. The hermeticity and caching benefits compound over time: every module added to a hermetic build is automatically cacheable and distributable without further investment.
What is the relationship between the build system and CI cycle time?
?
CI cycle time is directly determined by build system properties. A hermetic, artifact-based build with remote caching means CI workers share a build cache with developer machines: a CI build after a small change only rebuilds affected targets and downloads the rest from cache — taking seconds or minutes. A task-based build without caching may need to rebuild everything from scratch on every CI run — taking hours. At Google’s scale, the CI cycle time budget (target: minutes) can only be met with distributed, artifact-based builds.
What does the chapter mean by “build actions run in sandboxed environments”?
?
Each Bazel build action executes in an isolated filesystem and network environment — analogous to a minimal container — that exposes only the files listed as inputs. The action cannot read from $HOME, cannot access the network (unless explicitly declared), and cannot see other build outputs unless they are declared as inputs. This sandbox is what enforces hermeticity mechanically rather than relying on engineer discipline. The sandbox transforms hermeticity from a convention into a property the system guarantees.
What is the broader philosophy the chapter argues for regarding build investment?
?
That build system decisions made early in a codebase’s life are architectural decisions — hard to reverse, with consequences that compound over the life of the codebase. Investing in hermeticity, artifact-based discipline, and fine-grained module structure early produces a codebase that: scales to millions of lines without losing build correctness; enables CI to remain fast even as the codebase grows; allows automated large-scale changes to be validated quickly; and keeps the dependency graph visible and trustworthy. The authors frame this as the same sustainability argument that runs throughout the book: the correct investment at small scale is the one that remains correct at large scale.
Total Cards: 22
Review Time: ~18 minutes
Priority: MEDIUM
Last Updated: 2026-06-02