Chapter 12: The Science Behind This Book

Primary vs. Secondary Research

	Secondary Research	Primary Research
Data collected by	Someone else	The research team
Examples	Case studies, market research reports, book reports	The US Census, State of DevOps Report
Speed/cost	Faster, cheaper	Slower, more expensive
Control	Limited to existing data	Full control over questions
Value	Useful summary; may lack applicability	Can address specific questions; provides novel insights

This book and the State of DevOps Reports are primary research.

Qualitative vs. Quantitative Research

Qualitative: Data not in numerical form (interviews, blog posts, Twitter, ethnographic observations). Descriptive and allows emergent insights. Expensive to analyze.

Quantitative: Data in numerical form (system data, survey data with numerical responses). This book is quantitative — uses Likert-type survey instrument.

Likert-Type Scale

Records responses and assigns number values:

“Strongly disagree” = 1
Neutral = 4
“Strongly agree” = 7

Provides consistent measurement across all subjects + numerical basis for statistical analysis.

Six Types of Data Analysis (Dr. Jeffrey Leek’s Framework)

Listed in order of increasing complexity and analytical power:

Level	Type	Used in this book?
1	Descriptive	Yes — demographic information, summary statistics
2	Exploratory	Yes — correlations between variables
3	Inferential predictive	Yes — theory-driven hypothesis testing
4	Predictive	No — requires historical time-series data
5	Causal	No — requires randomized studies
6	Mechanistic	No — rare in business; seen in physical sciences

Also used: Classification (clustering) analysis (not in Leek’s framework cleanly).

Descriptive Analysis

Summarizes and reports data
Examples: census reports, vendor reports on tool adoption rates, Forrester DevOps adoption reports
Cannot make statements about causation or prediction
Only as good as the underlying research design and sampling

Exploratory Analysis

Looks for relationships in the data; identifies patterns
Includes correlations — how closely two variables move together
“Correlation doesn’t imply causation” — two variables moving together can be due to a third variable or chance
Example: Per capita cheese consumption is 94.71% correlated with deaths by bedsheet strangulation (spurious correlation)

Inferential Predictive Analysis

The third level — most common type in business and technology research today. Used when pure experimental design is not possible.

Key features:

Theory-driven hypotheses stated before analysis (avoids fishing for data / spurious correlations)
Tests whether evidence supports stated hypotheses
More evidence → more confidence
Methods: multiple linear regression, partial least squares (PLS) regression

“Whenever we talk about impacting or driving results in this book, our research design utilized this third type of analysis.”

Classification Analysis (Clustering)

Used to identify performance tiers (high/medium/low) from the four delivery metrics. The researchers used hierarchical clustering (vs. k-means) because:

No prior theory about number of groups expected
Hierarchical allows investigation of parent-child relationships in clusters
Dataset wasn’t huge (computational power not a concern)

Result: Distinct, statistically significant differences — high performers do better on all four measures, low do worse, medium are between.

The Research Design

Four key research questions evolved over the four years (2014–2017):

2014: Can software delivery be measured? Does it impact organizations? Does culture matter?
2015: Revalidate findings; add technical practices, Lean management, workforce impact (burnout, anxiety)
2016: Security, trunk-based development, test data management, Lean product management, eNPS, work identity
2017: Architecture’s impact, transformational leadership, non-profit organizational goals

All analyses: cross-sectional studies using the same rigorous methods as healthcare and workplace research.

Study Notes by Niladri & AI

Explorer

ch12-science-behind