Chapter 15: The Data for the Project
Data Collection Overview
Four years of annual surveys (2014–2017):
- 23,000+ total survey responses
- 2,000+ unique organizations
- Range: small startups (<5 employees) to large enterprises (>10,000 employees)
- Industries: startups, internet companies, finance, healthcare, government
- Technology contexts: greenfield and legacy, systems of record and engagement
Sampling Method: Snowball Sampling
Snowball sampling: Start with an initial sample (mailing lists, social media) and encourage participants to invite others.
Why this is appropriate:
- The target population (technology professionals engaged with DevOps) is not a defined list that can be randomly sampled
- The goal is not to produce a census of all technologists, but to understand the population of organizations actively engaged in software delivery
- Social networks spread the survey to people who work in software development and delivery
Limitations of Snowball Sampling
- Cannot claim results are perfectly representative of all technology organizations
- Response bias: people engaged with DevOps ideas may self-select
- The researchers acknowledge this and discuss it openly
Why It’s Still Valid
The goal of the research is to identify relationships and predictive patterns among capabilities and outcomes — not to count the exact proportion of organizations doing X. For this purpose:
- Large sample size (23,000+ responses) provides statistical power
- Replication across four years confirms findings are stable
- Findings hold across organization sizes, industries, and technology stacks
Survey Evolution Across Years
| Year | Key Additions |
|---|---|
| 2014 | Core delivery metrics, culture, basic technical practices |
| 2015 | Lean management practices, burnout, automation impact |
| 2016 | Security, trunk-based dev, test data mgmt, Lean product mgmt, eNPS, work identity |
| 2017 | Architecture, transformational leadership, non-profit organizational goals |
Each year built on prior findings: first confirming existing hypotheses, then extending the model.
Longitudinal Design Note
The research is cross-sectional (different people surveyed each year), not longitudinal (same people followed over time). This means:
- Year-over-year trends are based on different cohorts
- Cannot track the same organization’s improvement
- But cross-sectional design is standard in workplace and healthcare research
The consistent findings across different cohorts in different years provide confidence that the patterns reflect real phenomena, not sampling artifacts.
How to Read the Research Findings
When the book reports that “X drives Y”:
- The analysis is inferential predictive (see Ch. 12)
- Hypotheses were stated before data analysis
- Statistical methods: multiple linear regression or partial least squares
- Results are statistically significant (not just trend observations)
When the book reports correlations:
- These are Pearson correlations
- They show how closely two variables move together
- They do NOT show causation
The distinction matters: the book carefully distinguishes what can be claimed from each type of analysis.