Chapter 07 Flashcards — Measuring Engineering Productivity

flashcards seg productivity metrics gsm quants


What is the triage question that must be answered before beginning any productivity measurement?
?
Will the data actually be used to make a decision? Specifically: what decision will it inform, and what different data values would lead to different decisions? Measurement that does not drive decisions has negative ROI — its cost is never recovered. This triage step prevents expensive data collection that sits unused on dashboards.

What is the GSM framework?
?
A three-layer structure for defining measurements: Goals (what you want to achieve, expressed as outcomes), Signals (what would tell you the goal was achieved, before deciding how to measure), and Metrics (concrete, observable proxies for signals). The hierarchy prevents metric fixation by keeping goals and metrics explicitly separate.

What is a Goal in the GSM framework?
?
A statement of what you want to achieve, expressed in terms of user perspective or business outcomes — not in terms of measurements you already have. Good goals are specific enough to distinguish achievement from non-achievement and do not mention specific metrics. Anti-pattern: writing a goal around existing data (“improve our score on metric X”) encodes Goodhart’s Law from the start.

What is a Signal in the GSM framework?
?
A hypothetical indicator — something that, if you could observe it perfectly, would tell you whether a goal was achieved. Signals exist between goals (abstract) and metrics (concrete). Crucially, signals are articulated before deciding how to measure them. Signals can be impossible to measure directly; identifying unmeasurable signals helps you understand what approximations your metrics are making.

What is a Metric in the GSM framework?
?
A concrete, observable, countable proxy for a signal. Metrics are what you actually instrument and track. Good metrics are directional (clear which way is better), sensitive (change when the signal changes), resistant to gaming, and actionable (when wrong, you can diagnose and intervene). Metrics serve signals; signals serve goals — this hierarchy is the core of GSM.

What does QUANTS stand for?
?
Q — Quality of code; A — Attention from engineers (focus); N — iNtellectual complexity; T — Tempo/velocity; S — Satisfaction. It is Google’s mnemonic for the five dimensions of developer productivity, used to ensure measurement programs do not focus narrowly on a single dimension (usually velocity) at the expense of the others.

What does the Q (Quality) dimension of QUANTS measure?
?
Whether the code produced meets desired standards — bug rates, technical debt accumulation, test coverage, static analysis findings per change. Key insight: quality and velocity often trade off. Any productivity measurement that only measures velocity without measuring quality will produce optimization for speed at the expense of correctness and maintainability.

What does the A (Attention) dimension of QUANTS measure?
?
How much of an engineer’s time and attention is captured by actual work versus interruptions, context-switching, and overhead. Example metrics: percentage of engineers reporting uninterrupted focus time, meeting hours per week, frequency of context-switching. Acknowledges that productivity requires not just capability but the cognitive space to apply it.

What does the N (iNtellectual complexity) dimension of QUANTS measure?
?
The cognitive load imposed by the work — specifically, accidental complexity (imposed by tools, codebase, process) versus essential complexity (inherent in the problem). Example metrics: time to understand codebase before first change, onboarding time to productivity. The hardest QUANTS dimension to measure quantitatively, but among the most important — accidental complexity taxes every engineer for the life of the system.

What does the T (Tempo) dimension of QUANTS measure?
?
How quickly engineers move from idea to deployed, working code. Example metrics: commit frequency, code review turnaround time, time from commit to production deployment, features shipped per quarter. Most commonly measured QUANTS dimension and most prone to over-weighting — also the most gameable: commits can be split, reviews rubber-stamped, features cut into smaller pieces without real velocity improvement.

What does the S (Satisfaction) dimension of QUANTS measure?
?
Whether engineers find their work meaningful, feel effective, and enjoy their tools and processes. Example metrics: developer satisfaction surveys, Net Promoter Score for internal tools, voluntary attrition rates. Satisfaction is a leading indicator: when it drops, other QUANTS dimensions typically follow. Dissatisfied engineers leave; frustrated engineers are less productive even on objective measures.

What is Goodhart’s Law, and why is it central to productivity measurement?
?
“When a measure becomes a target, it ceases to be a good measure.” Once engineers are evaluated on a metric, they optimize for the metric itself rather than the underlying goal it was meant to represent — e.g., optimizing commit frequency by splitting changes unnecessarily, or inflating test coverage with meaningless tests. The GSM framework is partly a structural defense: keeping goals and metrics in separate layers makes it harder to conflate the proxy with the thing itself.

What is the actionability test for deciding whether to measure?
?
Ask: if the measurement shows a problem, can we actually change something about it? If the answer is “no” — the result would be interesting but unactionable — the measurement does not justify its cost. Measurement that cannot inform action is journalism, not engineering. The data must have a path to a decision or intervention to earn its collection cost.

Why is measuring only the Tempo dimension of QUANTS dangerous?
?
Because velocity without quality creates hidden costs. Optimizing for speed alone encourages shipping buggy code, accumulating technical debt, and neglecting test coverage — all of which eventually slow velocity more than any process improvement would have gained. The QUANTS framework exists precisely to force measurement programs to cover all five dimensions and prevent narrow optimization.

What is the role of qualitative data (surveys, interviews) in productivity measurement?
?
Qualitative data is not anecdote — it is a complementary measurement channel that captures dimensions instrumentation misses. If quantitative metrics indicate improvement but engineers report feeling less productive, the metric is missing something important. Engineer experience is data. Google uses surveys alongside behavioral metrics specifically because satisfaction, cognitive load, and focus are difficult to instrument directly.

How should metrics be validated before being relied upon for decisions?
?
Empirical validation: (1) verify the metric moves in the right direction when you know a real improvement occurred; (2) introduce a known productivity problem and check that the metric detects it; (3) triangulate — check that multiple independently derived metrics converge. A metric that cannot detect a known problem will not detect unknown ones. Never assume the metric-signal relationship without testing it.

What is triangulation in the context of productivity metrics?
?
Using multiple independently derived metrics to confirm a conclusion. No single metric is trustworthy in isolation; when multiple metrics converge on the same conclusion, confidence increases. When they diverge, at least one metric is failing to capture the underlying signal. Divergence is as informative as convergence — it reveals measurement gaps.

What is the difference between a point-in-time measurement and a trending metric?
?
A point-in-time measurement tells you the current state of a system (e.g., current build time). A trending metric tracks the same measurement over time, revealing whether the system is improving, degrading, or stable. Trend data reveals dynamics; snapshots reveal only the current position. For productivity work, trending metrics are almost always more actionable than one-time measurements.

What is the vanity metrics anti-pattern?
?
Metrics that go up easily but do not reflect real productivity — chosen because they look good in reports rather than because they represent meaningful improvement. Examples: raw commit count, lines of code added, number of pull requests merged. Vanity metrics produce false confidence and mislead prioritization by making low-value activity look like high-value productivity.

Why must productivity measurement data be broadly shared rather than kept by a central team?
?
Because teams that can see their own data make better local decisions. When data is hoarded by a central productivity team, that team becomes a bottleneck for insight. When distributed, productivity data creates social accountability without requiring management to directly intervene, and enables local experimentation. The EPR team at Google shared metrics broadly as a deliberate policy.

What is the cost-benefit heuristic for deciding whether to measure a productivity problem?
?
If the aggregate cost of the productivity loss being studied (number of engineers affected × time lost per engineer × time horizon) is smaller than the cost of measuring it, skip the measurement. The cost of measurement — instrumentation, data collection, analysis, and acting on results — must be recovered from the value of the decisions it enables. Cheap decisions do not justify expensive measurements.

What does it mean to “close the loop” in a productivity measurement program?
?
After an intervention, explicitly: (1) state in advance what the metrics should do if the change was beneficial; (2) observe what the metrics actually did; (3) update your understanding based on any discrepancy between prediction and observation. Without closing the loop, you cannot distinguish a successful intervention from a coincidental metric movement or a placebo effect.

What is survivorship bias in productivity measurement, and how does it distort results?
?
Measuring only completed work while ignoring work that was abandoned, blocked, or never started. A team that looks highly productive by velocity metrics may be experiencing high abandonment rates on difficult tasks or a growing backlog of blocked work. Survivorship bias makes productivity look better than it is by hiding all the work that didn’t make it to the finish line.

When is A/B testing the appropriate method for productivity measurement?
?
When you want to establish causality — not just correlation — between a change and a productivity outcome. Expose one group of engineers to the change (e.g., a new CI tool, a new code review process) and measure outcomes against a control group. A/B testing is the gold standard for productivity claims; observational data can only suggest that a change helped, while A/B testing can demonstrate it within the bounds of statistical confidence.

What is the EPR team at Google, and what was its purpose?
?
The Engineering Productivity Research (EPR) team was a dedicated team at Google specifically tasked with measuring engineering productivity at scale. Their role was to apply rigorous measurement science — including the GSM and QUANTS frameworks — to questions about developer effectiveness, with the goal of providing data that could inform tooling, process, and infrastructure investment decisions across the company.


Total Cards: 25
Review Time: ~20 minutes
Priority: MEDIUM
Last Updated: 2026-06-02