Chapter 2: Measuring Performance

Why Measuring Software Performance is Hard

Unlike manufacturing, software inventory is invisible. Work breakdown is arbitrary. Design and delivery activities happen simultaneously in Agile. Previous measurement attempts fail in two ways:

They focus on outputs rather than outcomes
They focus on individual/local rather than team/global measures

Three Flawed Measurement Approaches

Lines of Code

Rewards bloat over elegance
Incentivizes writing more code rather than solving business problems
Minimizing LOC isn’t right either (unreadable one-liners)

Velocity

Relative and team-dependent — teams can’t be compared
Easily gamed: inflate estimates, focus on stories at expense of collaboration
Destroys its utility as a planning tool

Utilization

High utilization is only good up to a point
Queue theory: as utilization approaches 100%, lead times approach infinity
No slack = no capacity to absorb unplanned work or improvement

The Four DORA Metrics

A valid performance measure must: (1) focus on global outcomes, not local, and (2) measure outcomes not output.

Metric	Description	Why
Delivery Lead Time	Code committed → code in production	From Lean theory; shorter = faster feedback and course correction
Deployment Frequency	How often deploys to production	Proxy for batch size (smaller batches = better)
Mean Time to Restore (MTTR)	How long to restore service after incident	Failure is inevitable; resilience matters more than MTBF
Change Failure Rate	% of changes that cause degraded service	Key quality metric; “percent complete and accurate”

Note: Deployment frequency is the reciprocal of batch size. More frequent deploys = smaller batches.

Performance Tiers (Cluster Analysis)

The book uses cluster analysis (hierarchical clustering) — a data-driven approach with no built-in concept of “good” or “bad.” Three clusters emerge naturally every year.

2016 Performance Benchmarks

	High	Medium	Low
Deploy Frequency	On-demand (multiple/day)	1/week–1/month	1/month–6/months
Lead Time	< 1 hour	1 week–1 month	1 month–6 months
MTTR	< 1 hour	< 1 day	< 1 day*
Change Fail Rate	0–15%	31–45%	16–30%

2017 Performance Benchmarks

	High	Medium	Low
Deploy Frequency	On-demand (multiple/day)	1/week–1/month	1/week–1/month*
Lead Time	< 1 hour	1 week–1 month	1 week–1 month*
MTTR	< 1 hour	< 1 day	1 day–1 week
Change Fail Rate	0–15%	0–15%	31–45%

*Low performers were lower on average but had the same median as medium performers.

The Big Finding: Speed AND Stability Are Correlated

“There is no tradeoff between improving performance and achieving higher levels of stability and quality. Rather, high performers do better at all of these measures.”

This refutes the dogma behind “bimodal IT” — the idea that fast systems must be less stable.

Trend: The Gap Is Growing

High performers maintained or improved 2016→2017
Low performers tried to increase tempo without addressing underlying obstacles
2017: Low performers lost ground in stability while trying to match tempo

Impact on Organizational Performance

High-performing organizations were twice as likely to exceed goals on:

Profitability, market share, productivity (commercial)
Quantity/quality of goods/services, operating efficiency, customer satisfaction, mission goals (non-commercial)

Implication: Don’t Outsource Strategic Software

“The fact that software delivery performance matters provides a strong argument against outsourcing the development of software that is strategic to your business.”

Software that differentiates your business should be built in-house. Use SaaS for commodity functions (payroll, office productivity).

Warning: Use Metrics Carefully

In pathological cultures, measurement becomes a tool of control and people hide information. “Whenever there is fear, you get the wrong numbers” (Deming).

Metrics only work in a learning culture. Fix the culture first; then apply measurement rigorously.

Study Notes by Niladri & AI

Explorer

ch02-measuring-performance