Chapter 22 Flashcards — Large-Scale Changes

flashcards seg large-scale-changes lsc rosie refactoring

What is a Large-Scale Change (LSC) and how is it distinguished from a large ordinary change?
?
A Large-Scale Change (LSC) is a change that is logically a single unit but that, due to codebase scale, must be implemented as many individually submitted changes. It is distinguished from a large ordinary change by structural characteristics — not total line count: (1) it crosses team ownership boundaries; (2) it cannot be submitted atomically; (3) the author is often not the domain expert for the code being changed; (4) it conflicts with other in-flight changes at a rate requiring systematic management, not ad-hoc rebasing.

Give four concrete examples of LSCs that Google performs.
?

Compiler upgrade: Moving all C++ code from C++14 to C++17 semantics, including fixing new warnings-as-errors across the codebase.
API migration: Replacing a deprecated authentication token API with a new one across millions of call sites.
Security patch: Updating all callers of a vulnerable cryptographic function to use a patched replacement.
Framework migration: Migrating all code from an old logging framework to a new one.
Other examples include enforcing new compiler warnings uniformly and adding required parameters to widely-called functions.

Why is an atomic commit infeasible for most LSCs?
?
Repository scale: VCS systems cannot efficiently handle commits touching millions of files. Merge conflicts: a change touching 100,000 files conflicts with virtually every other in-flight change — resolving conflicts becomes a Sisyphean task. Build verification: verifying the entire codebase takes hours even for a no-op change. Review capacity: standard code review requires reviewers to have domain expertise, but no single team owns all affected files. Ownership: getting simultaneous approval from 500+ teams for a single atomic change is practically impossible.

What is the “No Haunted Graveyards” principle and why is it essential for LSC programs?
?
No Haunted Graveyards means that LSC infrastructure must be capable of touching all code that needs to change — no code should be too risky, too complex, or too poorly understood to safely modify through proper process. If haunted code exists, LSC programs are incomplete: security patches cannot be applied uniformly, deprecated APIs cannot be fully removed, and the codebase becomes partitioned into maintainable and unmaintainable regions. The existence of a haunted graveyard is evidence of broken process (inadequate testing, unclear ownership, insufficient documentation), not merely complex code.

What are the five categories of infrastructure required before LSCs are feasible at scale?
?

Policies and culture — authorization governance, a culture of accepting LSC-initiated changes, and escalation paths for unresponsive owners.
Codebase insight — tools to find all instances of the pattern being changed, understand dependencies, and identify file owners.
Change management tooling — automated change generation, shard management, conflict handling, and progress tracking (Rosie).
Testing — comprehensive test coverage so each shard can be independently verified; test infrastructure that scales.
Language and tooling support — AST-based refactoring tools (ClangMR, JavaMR) and a build system that can verify changes at scale.

What are the four steps of Google’s LSC process in order?
?

Authorization: LSC author writes a migration proposal (scope, motivation, risks) reviewed by the LSC Steering Committee; authorization grants the right to change files owned by other teams.
Change creation: Automated tools (ClangMR, JavaMR, sed, scripts) generate the transformations; edge cases are identified; the change is split into shards.
Sharding and submitting: Rosie manages shard distribution to reviewers, tracks approvals, handles conflicts and rebases, and submits approved shards.
Cleanup: Verifying all instances are updated, removing old APIs and compatibility shims, and documenting the migration.

What is Rosie and what does it do?
?
Rosie is Google’s internal large-scale change management tool. It: (1) automatically shards the LSC into individually reviewable CLs; (2) assigns reviewers based on code ownership metadata; (3) automatically rebases shards that conflict with submitted changes; (4) runs tests for each shard; (5) provides a dashboard showing which shards are approved, failing, conflicting, or awaiting review; (6) supports escalation when a reviewer is unresponsive. Rosie’s core principle: each shard must be independently correct and independently approvable by the code owner, without needing to understand the overall migration.

Why do LSCs require a different code review model than ordinary changes?
?
Normal review assumes: the author is the domain expert, the reviewer can independently evaluate correctness, and the scope is small enough for one reviewer to hold in mind. LSCs violate all three: the author applies a mechanical transformation and may not understand the code; the reviewer (code owner) understands the code but not the global migration; the scope is too large for any single reviewer. The LSC review model shifts to: reviewers evaluate “Is this transformation safe in my code?” — not “Is this migration a good idea?” — with the global question resolved at authorization. This requires trust in the LSC process and the steering committee.

Why is the authorization step necessary beyond just getting permission?
?
Authorization provides organizational backing for the social work of the LSC. When a code owner asks “why is someone changing my code?”, the authorization document is the answer. It tells owners: (1) the change has been reviewed and approved by a governance body; (2) they are expected to approve well-formed shards; (3) there is an escalation path if they disagree. Without authorization, LSC authors have no standing to ask other teams to accept changes to their code, and the social friction makes large migrations impossible.

What are ClangMR and JavaMR, and why are they preferable to sed/regex for LSCs?
?
ClangMR (C++) and JavaMR (Java) are AST-based refactoring tools that operate on the Abstract Syntax Tree of the code rather than raw text. They are preferable to sed/regex because: (1) they understand syntax, scoping, and types — they cannot be confused by comments, string literals, or complex syntax; (2) they produce semantically correct transformations — renaming a function updates only its actual call sites, not occurrences of the same string in comments; (3) they can make context-sensitive changes impossible with text substitution. Regex-based transformations are brittle and generate false positives/negatives at scale.

What is a “shard” in the context of an LSC, and what properties must it have?
?
A shard is one independently reviewable and submittable unit of an LSC — typically covering a directory, a team’s ownership area, or some other natural boundary within the overall change. Each shard must be: (1) independently correct — the code in the shard must compile and pass tests without the rest of the LSC being submitted; (2) independently reviewable — the code owner can evaluate whether the change to their code is safe without understanding the global migration; (3) independently approvable — one team’s approval does not depend on another team’s approval. These properties are what make parallel, asynchronous progress possible.

Why is test coverage a prerequisite for safe LSCs, not just a nice-to-have?
?
Without test coverage, there is no way to verify that an automated code transformation is correct for the specific code being changed. Automated tools like ClangMR produce transformations that are syntactically correct but may still introduce semantic errors in edge cases. Tests provide the verification that the transformed code behaves identically to the original. Code without tests cannot be safely changed at scale — this is why haunted graveyards often persist: they lack the test coverage needed to give the LSC author confidence that their transformation is safe.

What makes cleanup the final required step of an LSC, rather than optional?
?
Cleanup — removing old APIs, deleting compatibility shims, cleaning up migration tooling — is mandatory because an incomplete migration leaves two patterns in the codebase simultaneously: the old and the new. This: (1) increases cognitive load for future readers who must understand both; (2) creates maintenance burden (the old API must still be supported); (3) creates confusion about which pattern is canonical; (4) means the motivation for the migration (removing complexity, eliminating a vulnerability, enforcing a new invariant) is not actually achieved. A migration is not complete until the old infrastructure is removed.

What is the “heterogeneity” barrier to atomic LSC changes?
?
Heterogeneity means different parts of a large codebase have different code styles, local conventions, and constraints. A transformation that is correct in one area may need to be expressed differently in another — different variable naming, different error handling patterns, different test idioms. No single author can hold this context for millions of files across hundreds of teams. This is why shard-based LSCs with local review by code owners work better than atomic changes: each owner applies their local knowledge to verify the transformation is appropriate for their specific code.

How do LSCs relate to the dependency management problems discussed in Chapter 21?
?
LSCs are the mechanism that makes Live at Head dependency management feasible. When a library maintainer at Google makes a breaking change, they use LSC tooling (Rosie, ClangMR/JavaMR) to update all callers across the monorepo as part of the same change. Without LSC infrastructure, “update all consumers when making a breaking change” is aspirational but not actionable — the scope is too large. LSCs are what close the gap between the ideal of Live at Head (everyone always on the latest version) and the practical reality of a large, active codebase.

Why does the chapter argue that “any organization with a large, shared codebase will need something like Google’s LSC infrastructure”?
?
Because without LSC infrastructure, large categories of necessary changes simply will not happen: deprecated APIs accumulate, security patches are applied incompletely, and technical debt that requires cross-cutting changes cannot be paid down. The alternative — breaking the codebase into independently versioned components with explicit dependency management — trades the LSC problem for the diamond dependency problem (Chapter 21). Organizations that grow a large shared codebase without building LSC infrastructure discover this when they attempt their first large migration and find it takes years, not weeks.

What is the role of the Large-Scale Change Steering Committee?
?
The LSC Steering Committee authorizes LSC proposals after reviewing: scope (how many files, teams, and risks are involved), motivation (why is this change necessary?), and process (how will the change be made, tested, and cleaned up?). Authorization grants the LSC author organizational standing to submit changes to files owned by other teams, with the expectation that owners will approve appropriately-scoped shards. The committee also provides escalation authority when owners are unresponsive or refuse changes, ensuring that the LSC program cannot be blocked by individual team resistance.

What types of teams typically initiate LSCs at Google?
?

Platform and infrastructure teams: Teams owning shared foundations (compilers, build systems, runtime libraries) whose changes cascade to all dependents.
Language teams: Making updates to language idioms, enforcing new patterns, or migrating away from deprecated language features.
Security teams: Applying security patches that require updating call sites across the codebase.
Code health teams: Dedicated teams whose mission is maintaining codebase quality, executing deprecations, and enforcing new standards.
Individual contributors: Any engineer can initiate an LSC after steering committee authorization, though in practice it is most common from the above teams.

What is the “sunk cost” failure mode for LSC cleanup?
?
The cleanup sunk-cost failure mode occurs when teams leave compatibility shims or old APIs in place after a migration “just in case” something breaks, telling themselves the risk is too high to remove the old code. This leaves the old complexity alongside the new, creating a worse outcome than either not migrating or completing the migration: (1) the old API still requires maintenance support; (2) new code may accidentally use the old pattern; (3) the invariant the migration was meant to establish is not enforced. Completing cleanup requires accepting that the migration was correct and that the old infrastructure should be fully retired.

What distinguishes Google’s approach to LSCs from a manual “big bang” migration?
?
A manual “big bang” migration attempts to make all changes simultaneously by a single team or in a single sprint, with a freeze on other changes during the migration period. This fails at Google’s scale because: (1) freezing a billion-line active codebase is operationally impossible; (2) a single team cannot develop expertise in thousands of other teams’ code; (3) conflict resolution with thousands of in-flight changes is unmanageable without automation. Google’s LSC approach is incremental and parallel: shards are submitted continuously over days or weeks, Rosie handles conflicts automatically, and no code freeze is required.

How does the LSC model affect the social contract between library authors and their consumers?
?
In the LSC model, library authors accept responsibility for updating all consumers when making breaking changes — they do not release a new API and leave consumers to figure out the migration. This inverts the normal open-source dynamic (where breaking changes are the consumer’s problem) and is only possible because of the monorepo and LSC tooling. Consumers, in turn, accept that their code will be changed by LSC processes they did not initiate, trusting that the authorization process vetted the change. This mutual accountability — authors update consumers, consumers trust the process — is a cultural product of the LSC program, not just a technical one.

Why is codebase search and dependency analysis a prerequisite for creating an LSC?
?
Before writing a single line of transformation code, an LSC author must know: (1) which files contain the pattern being changed — incomplete coverage leaves old patterns in place; (2) who owns each file — reviewer assignment is impossible without ownership metadata; (3) what depends on what — a change to a widely used function must account for all call sites, including transitive ones; (4) what the estimated scope is — authorization proposals must include scope estimates to enable risk assessment. Without comprehensive code search and dependency analysis, LSCs are blind: they produce migrations that miss cases, break unexpected callers, or cannot be reviewed efficiently.

Total Cards: 22
Review Time: ~18 minutes
Priority: HIGH
Last Updated: 2026-06-02

Study Notes by Niladri & AI

Explorer

ch22-flashcards

Chapter 22 Flashcards — Large-Scale Changes

Graph View