Chapter 15: The Data for the Project

Data Collection Overview

Four years of annual surveys (2014–2017):

23,000+ total survey responses
2,000+ unique organizations
Range: small startups (<5 employees) to large enterprises (>10,000 employees)
Industries: startups, internet companies, finance, healthcare, government
Technology contexts: greenfield and legacy, systems of record and engagement

Sampling Method: Snowball Sampling

Snowball sampling: Start with an initial sample (mailing lists, social media) and encourage participants to invite others.

Why this is appropriate:

The target population (technology professionals engaged with DevOps) is not a defined list that can be randomly sampled
The goal is not to produce a census of all technologists, but to understand the population of organizations actively engaged in software delivery
Social networks spread the survey to people who work in software development and delivery

Limitations of Snowball Sampling

Cannot claim results are perfectly representative of all technology organizations
Response bias: people engaged with DevOps ideas may self-select
The researchers acknowledge this and discuss it openly

Why It’s Still Valid

The goal of the research is to identify relationships and predictive patterns among capabilities and outcomes — not to count the exact proportion of organizations doing X. For this purpose:

Large sample size (23,000+ responses) provides statistical power
Replication across four years confirms findings are stable
Findings hold across organization sizes, industries, and technology stacks

Survey Evolution Across Years

Year	Key Additions
2014	Core delivery metrics, culture, basic technical practices
2015	Lean management practices, burnout, automation impact
2016	Security, trunk-based dev, test data mgmt, Lean product mgmt, eNPS, work identity
2017	Architecture, transformational leadership, non-profit organizational goals

Each year built on prior findings: first confirming existing hypotheses, then extending the model.

Longitudinal Design Note

The research is cross-sectional (different people surveyed each year), not longitudinal (same people followed over time). This means:

Year-over-year trends are based on different cohorts
Cannot track the same organization’s improvement
But cross-sectional design is standard in workplace and healthcare research

The consistent findings across different cohorts in different years provide confidence that the patterns reflect real phenomena, not sampling artifacts.

How to Read the Research Findings

When the book reports that “X drives Y”:

The analysis is inferential predictive (see Ch. 12)
Hypotheses were stated before data analysis
Statistical methods: multiple linear regression or partial least squares
Results are statistically significant (not just trend observations)

When the book reports correlations:

These are Pearson correlations
They show how closely two variables move together
They do NOT show causation

The distinction matters: the book carefully distinguishes what can be claimed from each type of analysis.

Study Notes by Niladri & AI

Explorer

ch15-data-for-project