Chapter 15: The Data for the Project

Data Collection Overview

Four years of annual surveys (2014–2017):

  • 23,000+ total survey responses
  • 2,000+ unique organizations
  • Range: small startups (<5 employees) to large enterprises (>10,000 employees)
  • Industries: startups, internet companies, finance, healthcare, government
  • Technology contexts: greenfield and legacy, systems of record and engagement

Sampling Method: Snowball Sampling

Snowball sampling: Start with an initial sample (mailing lists, social media) and encourage participants to invite others.

Why this is appropriate:

  1. The target population (technology professionals engaged with DevOps) is not a defined list that can be randomly sampled
  2. The goal is not to produce a census of all technologists, but to understand the population of organizations actively engaged in software delivery
  3. Social networks spread the survey to people who work in software development and delivery

Limitations of Snowball Sampling

  • Cannot claim results are perfectly representative of all technology organizations
  • Response bias: people engaged with DevOps ideas may self-select
  • The researchers acknowledge this and discuss it openly

Why It’s Still Valid

The goal of the research is to identify relationships and predictive patterns among capabilities and outcomes — not to count the exact proportion of organizations doing X. For this purpose:

  • Large sample size (23,000+ responses) provides statistical power
  • Replication across four years confirms findings are stable
  • Findings hold across organization sizes, industries, and technology stacks

Survey Evolution Across Years

YearKey Additions
2014Core delivery metrics, culture, basic technical practices
2015Lean management practices, burnout, automation impact
2016Security, trunk-based dev, test data mgmt, Lean product mgmt, eNPS, work identity
2017Architecture, transformational leadership, non-profit organizational goals

Each year built on prior findings: first confirming existing hypotheses, then extending the model.

Longitudinal Design Note

The research is cross-sectional (different people surveyed each year), not longitudinal (same people followed over time). This means:

  • Year-over-year trends are based on different cohorts
  • Cannot track the same organization’s improvement
  • But cross-sectional design is standard in workplace and healthcare research

The consistent findings across different cohorts in different years provide confidence that the patterns reflect real phenomena, not sampling artifacts.

How to Read the Research Findings

When the book reports that “X drives Y”:

  • The analysis is inferential predictive (see Ch. 12)
  • Hypotheses were stated before data analysis
  • Statistical methods: multiple linear regression or partial least squares
  • Results are statistically significant (not just trend observations)

When the book reports correlations:

  • These are Pearson correlations
  • They show how closely two variables move together
  • They do NOT show causation

The distinction matters: the book carefully distinguishes what can be claimed from each type of analysis.