Chapter 01 Flashcards — What Is Software Engineering?

flashcards seg software-engineering sustainability hyrams-law

How does the book “Software Engineering at Google” define software engineering?
?
Software engineering is programming integrated over time. It encompasses not just the act of writing code but all the policies, practices, and tools that allow an organization to maintain and evolve code over its expected lifetime. The key distinction from programming is the time dimension: programming asks “does this work now?” while software engineering asks “will we be able to keep this working over years or decades?”

What three axes does the book use to distinguish software engineering from programming?
?

Time — software engineering must account for the long-term evolution of code, not just its immediate behavior.
Scale — engineering decisions must remain feasible as team size, codebase size, and user volume grow by orders of magnitude.
Trade-offs — every engineering decision involves costs that must be identified, quantified, and weighed; there are no free choices.

What is the definition of a “sustainable” codebase?
?
A codebase is sustainable when, for its expected life span, the engineering organization is capable of responding to necessary changes — in dependencies, technology, infrastructure, or product requirements — without those changes becoming prohibitively expensive. Unsustainability is dangerous because it often accumulates silently: each deferred change makes the next one harder, until the system demands a costly reckoning.

State Hyrum’s Law exactly.
?
“With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended upon by somebody.” Named after Hyrum Wright (a co-author of the book). The law generalizes the observation that users depend on observed behavior, not documented behavior — making every observable output effectively part of the public contract.

What are the practical engineering implications of Hyrum’s Law?
?

Changing any observable behavior — even undocumented or explicitly labeled “implementation detail” — will break some user’s code if the API has enough consumers.
API surface area should be minimized: the fewer observable behaviors, the fewer accidental dependencies.
Engineers should actively randomize or hide non-guaranteed behaviors (e.g., hash ordering) to prevent users from depending on them.
Large-scale migrations must account for Hyrum’s Law violations even when the documented contract is preserved.

Why did Go randomize map iteration order and Python randomize dict hash seeds?
?
As a deliberate defense against Hyrum’s Law. If map/dict iteration order were deterministic, users would inevitably depend on that order — even though it is never guaranteed. Once enough users depend on it, the order can never be changed without breaking them. Randomizing the order forces users to write code that does not assume ordering, keeping the implementation free to change. This is a proactive defense against accidental API contracts.

What is the “hash ordering anti-pattern” as described in the SEG book?
?
The Hyrum’s Law Trap: allowing an implementation detail (such as iteration order) to be observable, having downstream users depend on it, and then finding the implementation cannot be improved without breaking those users. The pattern demonstrates that observable behavior is de facto API, regardless of what the documentation says. The defense is to randomize or otherwise obscure non-guaranteed behaviors before users accumulate.

What is “Shifting Left” and why does it matter at Google’s scale?
?
Shifting Left means detecting and fixing defects earlier in the development pipeline — from production back toward code review and local development. Cost grows with the distance between introduction and detection: a bug found in code review costs ~1x; in CI ~5x; in staging ~20x; in production ~100x. At Google’s scale with enormous change volume, the investment in early detection (fast tests, static analysis, rigorous code review) pays for itself many times over.

What is the approximate relative cost of fixing a defect at each stage of development?
?

Stage	Relative Cost
Code review	~1x
CI / automated tests	~5x
Staging / QA	~20x
Production	~100x
These are order-of-magnitude estimates. The principle is that cost compounds with time and distance from introduction — making investment in early detection economically rational even when it feels like overhead.

What is a scalable policy and why does it matter for a large engineering organization?
?
A scalable policy is one whose human-time cost does not grow linearly (or superlinearly) with the number of engineers, changes, or systems. At Google’s scale, any policy requiring human attention per change (manual approvals, manual code reviews for style, manual dependency upgrade tracking) becomes infeasible because the volume of changes exceeds human capacity. Scalable policies typically involve automation, tooling, and systematic enforcement rather than human judgment per case.

Give two examples of scalable policies and two examples of non-scalable policies.
?
Scalable: (1) Automated linting and style enforcement on every commit — O(1) human time per change. (2) Automated migration tooling (sed/AST transforms) that lets one engineer author a change applied to millions of files. Non-scalable: (1) “Email the security team for approval on every external API call” — creates a reviewer bottleneck that grows with change volume. (2) “Avoid upgrading dependencies until forced” — defers cost but makes each eventual upgrade exponentially harder, consuming linear human time per upgrade event.

What is the “deferred dependency upgrade” anti-pattern and why is it dangerous?
?
Deferred Dependency Upgrade: postponing upgrades to dependencies (compilers, libraries, language runtimes) to avoid short-term disruption. Each deferral increases the version distance between current and target, increasing the probability of breaking changes and the cognitive load required to upgrade. The upgrade that would have cost 1 week if done incrementally may cost 3 months if deferred for two years. The sustainable policy is to upgrade continuously on a regular cadence, keeping distance small.

How does Google use compiler upgrades to illustrate sustainable vs. unsustainable policy?
?
Upgrading a compiler across a billion-line codebase (e.g., GCC 7 to GCC 9) can introduce new warnings-as-errors, undefined behavior detections, and ABI changes. Done infrequently, each upgrade becomes a massive, high-risk, expensive project requiring months of effort. Done continuously (small, frequent upgrades), each step is small, tooling is already in place, and engineers know what to expect. Google’s policy of continuous incremental upgrades is an example of a scalable, sustainable practice that feels more expensive per event but is dramatically cheaper in aggregate.

What categories of cost must be weighed in a software engineering trade-off decision?
?

Financial costs — infrastructure, licensing, tooling spend.
Resource costs — engineer-hours to build and maintain.
Personnel costs — cognitive load, onboarding complexity, specialist skill requirements.
Transaction costs — coordination overhead, approvals, migration effort.
Opportunity costs — what cannot be built because this was built instead.
Societal costs — user impact, environmental impact (significant at Google’s scale).
All must be evaluated at the scale at which they apply — a 10-engineer estimate is meaningless for a 10,000-engineer context.

Why is the distributed build system (Blaze/Bazel) justified at Google but overkill at a startup?
?
At a startup, a local build system is sufficient and a distributed build farm would cost more to maintain than it saves. At Google, with a billion-line monorepo, local builds would take days — making continuous integration and rapid iteration physically impossible. The scale multiplier inverts the trade-off: the investment in distributed infrastructure is not just justified but required for feasibility. This illustrates that cost/benefit calculations are always scale-dependent: the same decision can be wrong at one scale and the only viable option at another.

What is the difference between “it works” and “it is maintainable”?
?
“It works” means the code currently produces correct output for the given inputs — a point-in-time claim. “It is maintainable” means the code can be safely understood, modified, and extended over its expected lifetime by engineers who may not have written it — a durational claim. Software engineering requires both. A system that works but cannot be maintained is a liability: it will work until the environment changes or requirements evolve, at which point the cost of modification may be prohibitive.

What is the “sunk cost architecture” anti-pattern?
?
Sunk Cost Architecture: continuing to invest in or defend a failing architectural decision because of the time and resources already spent on it, rather than evaluating current and future costs of alternatives. The correct frame is always: “Given where we are now, what is the best path forward?” The prior investment is irretrievable; only future costs and benefits are relevant to the current decision. This anti-pattern is common when teams have significant emotional or political investment in their existing system.

What does it mean for code to have an “expected life span” and why does it affect engineering decisions?
?
Every piece of code has an implicit or explicit expected life span — the duration for which it is expected to be in service. Short-lived code (exploratory scripts, one-off migrations) does not warrant sustainability investment because it will be deleted before dependencies accumulate. Long-lived code (core infrastructure, foundational libraries) warrants significant investment because it must survive environmental changes the original authors cannot predict. Mismatching sustainability investment to life span is a common mistake in both directions: over-engineering disposable code and under-engineering permanent infrastructure.

Why is the “it will never need to change” assumption almost always wrong?
?
Because change is driven by forces outside the code itself: security vulnerabilities force patching, compliance requirements force updates, dependency deprecations force migrations, infrastructure evolution forces adaptation, and team turnover forces re-comprehension. Even code that has stable requirements must absorb these external changes. The question is not whether change will be required but when and at what cost. This is why sustainability must be designed in from the start rather than retrofitted later.

What does “programming integrated over time” mean in practice?
?
It means that the activities of software engineering extend far beyond writing initial code to include: designing APIs for changeability, writing tests that remain valid as implementations evolve, documenting decisions so future engineers can understand context, choosing dependencies that will be maintainable, building tooling to automate migration at scale, and establishing policies that keep the codebase healthy over years. The “over time” dimension is what separates engineering discipline from scripting ability.

How does scale change the trade-off calculation for a policy?
?
Scale changes the denominator over which a fixed cost is amortized and multiplies the cost of per-unit-of-change policies. A manual review step that takes 10 minutes is trivial at 100 changes/month (17 hours/month) but impossible at 100,000 changes/month (17,000 hours/month — roughly 100 full-time engineers). Scale also changes the impact of mistakes: a bug that affects 1,000 users requires different mitigation than one affecting 1 billion. Trade-off calculations must always specify the scale at which they are being made.

Why do the authors advocate for making decisions revisable?
?
Because at Google’s scale, perfect decisions are not achievable — the information required for perfection is not available at decision time, and the environment changes after decisions are made. The practical goal is to: (1) avoid irreversible mistakes by deliberating more on hard-to-reverse decisions, (2) detect reversible mistakes quickly through monitoring and feedback loops, and (3) correct reversible mistakes without institutional defensiveness. The discipline is not “decide correctly once” but “build systems for rapid, low-cost correction.”

What are three categories of change that every long-lived codebase must absorb?
?

Dependency evolution — compilers, languages, OS libraries, and third-party packages change; security vulnerabilities require patches; deprecated APIs must be migrated.
Infrastructure change — hardware architectures evolve, cloud APIs change, networking stacks are replaced; code must adapt even when requirements do not.
Team and knowledge change — engineers leave; new engineers must understand and modify code they did not write; documentation and clarity become critical for long-term maintainability.

What is the key difference between how “cost” is understood in programming vs. software engineering?
?
In programming, cost is primarily the cost of creation — engineer time to write and test the initial code. In software engineering, cost is the total cost of ownership — creation plus the cost of all future changes, migrations, dependency upgrades, onboarding, debugging, and eventual deprecation, summed over the code’s life span. Decisions that minimize creation cost while maximizing ownership cost are a bad engineering trade-off, even though they look efficient in the short term.

Why does the book’s definition of software engineering imply that organizational policies matter as much as technical decisions?
?
Because the challenges of time and scale are primarily organizational — they manifest as policies that determine how changes are reviewed, how dependencies are managed, how migrations are executed, and how knowledge is shared. A technically perfect architecture maintained by a dysfunctional process will degrade; a modest architecture maintained by excellent engineering practices can sustain indefinitely. Policies are engineering artifacts with costs, trade-offs, and sustainability requirements just like code.

Total Cards: 24
Review Time: ~20 minutes
Priority: HIGH
Last Updated: 2026-06-02

Study Notes by Niladri & AI

Explorer

ch01-flashcards

Chapter 01 Flashcards — What Is Software Engineering?

Graph View