Chapter 2 Cheat Sheet - Data Models and Query Languages

One-Line Summaries

Concept	One-Liner
Relational model	Tables + rows + foreign keys; joins handle relationships
Document model	Self-contained JSON/BSON; great for tree-shaped data
Graph model	Vertices + edges; ideal for highly connected data
Schema-on-write	DB enforces structure at write time (relational)
Schema-on-read	Structure interpreted at read time (document)
Declarative query	Specify what, let optimizer decide how (SQL)
Impedance mismatch	Gap between OOP objects and relational tables
Data locality	Storing related data together for faster reads

Quick Model Comparison

Feature	Relational	Document	Graph
Best for	Many-to-many	One-to-many trees	Complex relationships
Join support	Native	Limited/app-level	Traversal
Schema	Strict (on-write)	Flexible (on-read)	Flexible
Query language	SQL	MQL / Aggregation Pipeline	Cypher / SPARQL
Examples	PostgreSQL, MySQL	MongoDB, CouchDB	Neo4j, Amazon Neptune
Scaling	Vertical (traditionally)	Horizontal	Depends

Data Model Decision Tree

What's the shape of your data?
│
├─ Tree-like (one-to-many, no cross-refs)?
│  └─ → Document model (MongoDB, CouchDB)
│
├─ Highly interconnected, traversal matters?
│  └─ → Graph model (Neo4j, Amazon Neptune)
│
├─ Complex joins, many-to-many, reporting?
│  └─ → Relational model (PostgreSQL, MySQL)
│
└─ All of the above?
   └─ → Multi-model DB (PostgreSQL+JSONB, or polyglot persistence)

Normalization vs Denormalization

Normalized (Relational)                 Denormalized (Document)
─────────────────────────               ────────────────────────
Store city as ID → lookup table         Store city name in every record
Pro: No duplication, easy updates       Pro: Fast reads, no joins needed
Con: Joins required for reads           Con: Duplication, hard to update
Use: Data changes frequently            Use: Read-heavy, data rarely changes

Schema Trade-off

Schema-on-write (Relational)            Schema-on-read (Document)
────────────────────────────            ─────────────────────────
ALTER TABLE to change structure         Add new fields freely
DB validates on every insert            App validates at read time
Safer for consistent data               Flexible for heterogeneous data
Slow migrations on large tables         No migration needed

Query Language Comparison

Imperative (how):                       Declarative (what):
────────────────                        ────────────────────
animals = []                            SELECT * FROM animals
for (animal in db.animals):             WHERE family = 'Sharks'
  if animal.family == 'Sharks':
    animals.append(animal)

Con: Specifies exact execution          Pro: Optimizer picks best strategy
Con: Hard to parallelize                Pro: Easy to parallelize
Con: No optimizer opportunity           Pro: Future-proof

Graph Model Building Blocks

Property Graph:
Vertex ──────────────────────────────────── Edge
├─ Unique ID                                ├─ Unique ID
├─ Set of outgoing edges                    ├─ Tail vertex (start)
├─ Set of incoming edges                    ├─ Head vertex (end)
└─ Properties (key-value)                   ├─ Label (describes relation)
                                            └─ Properties (key-value)

Triple-Store:
(subject) ──[predicate]──→ (object)
(Jim)     ──[likes]──────→ (bananas)     ← object is a value
(Jim)     ──[bornIn]─────→ (Idaho)       ← object is a vertex

MapReduce Mental Model

Document: { ocean: "Atlantic", species: "shark", month: "Jan" }

Map function:   emit(this.ocean + "-" + this.month, 1)
                → ("Atlantic-Jan", 1), ("Atlantic-Jan", 1), ...

Reduce function: sum(values)
                → ("Atlantic-Jan", 47)

Result: Counts of catches by ocean and month

Key Trade-offs

Decision	Pro	Con	When to Use
Document model	No joins, flexible schema	Poor for many-to-many	Self-contained tree data
Relational model	Powerful joins, ACID	Impedance mismatch, rigid schema	Complex relationships
Graph model	Fast traversal	Overkill for simple data	Social graphs, recommendations
Schema-on-read	Flexible, easy migration	No enforcement, bugs at read time	Heterogeneous data
Schema-on-write	Consistent, validated	Slow migrations, rigid	Structured, uniform data

Red Flags in Data Model Choices

❌ Using document DB for data with heavy many-to-many joins
❌ Using relational DB when objects are deeply nested trees
❌ Denormalizing data that changes frequently (stale copies everywhere)
❌ Choosing graph DB when data isn’t highly connected
❌ Ignoring the impedance mismatch (letting ORM hide all complexity)

Green Flags in Data Model Choices

✅ Match the data model to the natural shape of your data
✅ Consider access patterns before choosing model
✅ Use joins for many-to-many, documents for tree-shaped data
✅ Keep schema flexible when external systems control data shape
✅ Use multi-model DBs to avoid polyglot proliferation

Modern Additions (2026)

Multi-Model DBs:
├─ PostgreSQL + JSONB (relational + document)
├─ MongoDB Atlas (document + time series + search)
└─ Single DB for multiple data models

Vector Databases (new in 2026):
├─ Store high-dimensional embeddings for ML
├─ Similarity search (cosine, dot product)
└─ Examples: Pinecone, Weaviate, pgvector

Convergence:
├─ MongoDB added joins ($lookup)
├─ PostgreSQL added JSON support
└─ Lines between models increasingly blurred

Interview Response Templates

When Asked to Choose a Database Model

“First, I’d understand the relationships in the data. If it’s tree-shaped with mostly one-to-many, a document DB like MongoDB avoids the impedance mismatch. If it has complex many-to-many joins or reporting needs, relational is safer. For highly connected data where traversal is central, a graph DB makes sense. In 2026 I’d also consider multi-model options like PostgreSQL with JSONB to avoid operating multiple databases.”

When Asked About Schema Design

“Schema-on-write enforces consistency and catches errors early, but requires migrations for changes. Schema-on-read is more flexible for heterogeneous data. The choice depends on how uniform the data is and how often the structure changes. Either way, the schema exists—schema-on-read just moves enforcement to the application layer.”

When Asked About SQL vs NoSQL

“The better framing is: what data model fits my use case? SQL/relational excels at joins and many-to-many relationships. Document DBs excel at self-contained tree-shaped data. The historical NoSQL vs SQL debate is mostly settled—modern systems are converging and the choice is really about data model fit, not just scalability.”

Quick Revision Time: 5 minutes
Interview Prep: 15 minutes
Last Updated: 2026-04-13

Study Notes by Niladri & AI

Explorer

ch02-cheatsheet