Chapter 12: The Science Behind This Book

Primary vs. Secondary Research

Secondary ResearchPrimary Research
Data collected bySomeone elseThe research team
ExamplesCase studies, market research reports, book reportsThe US Census, State of DevOps Report
Speed/costFaster, cheaperSlower, more expensive
ControlLimited to existing dataFull control over questions
ValueUseful summary; may lack applicabilityCan address specific questions; provides novel insights

This book and the State of DevOps Reports are primary research.

Qualitative vs. Quantitative Research

Qualitative: Data not in numerical form (interviews, blog posts, Twitter, ethnographic observations). Descriptive and allows emergent insights. Expensive to analyze.

Quantitative: Data in numerical form (system data, survey data with numerical responses). This book is quantitative — uses Likert-type survey instrument.

Likert-Type Scale

Records responses and assigns number values:

  • “Strongly disagree” = 1
  • Neutral = 4
  • “Strongly agree” = 7

Provides consistent measurement across all subjects + numerical basis for statistical analysis.

Six Types of Data Analysis (Dr. Jeffrey Leek’s Framework)

Listed in order of increasing complexity and analytical power:

LevelTypeUsed in this book?
1DescriptiveYes — demographic information, summary statistics
2ExploratoryYes — correlations between variables
3Inferential predictiveYes — theory-driven hypothesis testing
4PredictiveNo — requires historical time-series data
5CausalNo — requires randomized studies
6MechanisticNo — rare in business; seen in physical sciences

Also used: Classification (clustering) analysis (not in Leek’s framework cleanly).

Descriptive Analysis

  • Summarizes and reports data
  • Examples: census reports, vendor reports on tool adoption rates, Forrester DevOps adoption reports
  • Cannot make statements about causation or prediction
  • Only as good as the underlying research design and sampling

Exploratory Analysis

  • Looks for relationships in the data; identifies patterns
  • Includes correlations — how closely two variables move together
  • “Correlation doesn’t imply causation” — two variables moving together can be due to a third variable or chance
  • Example: Per capita cheese consumption is 94.71% correlated with deaths by bedsheet strangulation (spurious correlation)

Inferential Predictive Analysis

The third level — most common type in business and technology research today. Used when pure experimental design is not possible.

Key features:

  • Theory-driven hypotheses stated before analysis (avoids fishing for data / spurious correlations)
  • Tests whether evidence supports stated hypotheses
  • More evidence → more confidence
  • Methods: multiple linear regression, partial least squares (PLS) regression

“Whenever we talk about impacting or driving results in this book, our research design utilized this third type of analysis.”

Classification Analysis (Clustering)

Used to identify performance tiers (high/medium/low) from the four delivery metrics. The researchers used hierarchical clustering (vs. k-means) because:

  1. No prior theory about number of groups expected
  2. Hierarchical allows investigation of parent-child relationships in clusters
  3. Dataset wasn’t huge (computational power not a concern)

Result: Distinct, statistically significant differences — high performers do better on all four measures, low do worse, medium are between.

The Research Design

Four key research questions evolved over the four years (2014–2017):

  • 2014: Can software delivery be measured? Does it impact organizations? Does culture matter?
  • 2015: Revalidate findings; add technical practices, Lean management, workforce impact (burnout, anxiety)
  • 2016: Security, trunk-based development, test data management, Lean product management, eNPS, work identity
  • 2017: Architecture’s impact, transformational leadership, non-profit organizational goals

All analyses: cross-sectional studies using the same rigorous methods as healthcare and workplace research.