Chapter 2 Cheat Sheet - Data Models and Query Languages
One-Line Summaries
| Concept | One-Liner |
|---|---|
| Relational model | Tables + rows + foreign keys; joins handle relationships |
| Document model | Self-contained JSON/BSON; great for tree-shaped data |
| Graph model | Vertices + edges; ideal for highly connected data |
| Schema-on-write | DB enforces structure at write time (relational) |
| Schema-on-read | Structure interpreted at read time (document) |
| Declarative query | Specify what, let optimizer decide how (SQL) |
| Impedance mismatch | Gap between OOP objects and relational tables |
| Data locality | Storing related data together for faster reads |
Quick Model Comparison
| Feature | Relational | Document | Graph |
|---|---|---|---|
| Best for | Many-to-many | One-to-many trees | Complex relationships |
| Join support | Native | Limited/app-level | Traversal |
| Schema | Strict (on-write) | Flexible (on-read) | Flexible |
| Query language | SQL | MQL / Aggregation Pipeline | Cypher / SPARQL |
| Examples | PostgreSQL, MySQL | MongoDB, CouchDB | Neo4j, Amazon Neptune |
| Scaling | Vertical (traditionally) | Horizontal | Depends |
Data Model Decision Tree
What's the shape of your data?
│
├─ Tree-like (one-to-many, no cross-refs)?
│ └─ → Document model (MongoDB, CouchDB)
│
├─ Highly interconnected, traversal matters?
│ └─ → Graph model (Neo4j, Amazon Neptune)
│
├─ Complex joins, many-to-many, reporting?
│ └─ → Relational model (PostgreSQL, MySQL)
│
└─ All of the above?
└─ → Multi-model DB (PostgreSQL+JSONB, or polyglot persistence)
Normalization vs Denormalization
Normalized (Relational) Denormalized (Document)
───────────────────────── ────────────────────────
Store city as ID → lookup table Store city name in every record
Pro: No duplication, easy updates Pro: Fast reads, no joins needed
Con: Joins required for reads Con: Duplication, hard to update
Use: Data changes frequently Use: Read-heavy, data rarely changes
Schema Trade-off
Schema-on-write (Relational) Schema-on-read (Document)
──────────────────────────── ─────────────────────────
ALTER TABLE to change structure Add new fields freely
DB validates on every insert App validates at read time
Safer for consistent data Flexible for heterogeneous data
Slow migrations on large tables No migration needed
Query Language Comparison
Imperative (how): Declarative (what):
──────────────── ────────────────────
animals = [] SELECT * FROM animals
for (animal in db.animals): WHERE family = 'Sharks'
if animal.family == 'Sharks':
animals.append(animal)
Con: Specifies exact execution Pro: Optimizer picks best strategy
Con: Hard to parallelize Pro: Easy to parallelize
Con: No optimizer opportunity Pro: Future-proof
Graph Model Building Blocks
Property Graph:
Vertex ──────────────────────────────────── Edge
├─ Unique ID ├─ Unique ID
├─ Set of outgoing edges ├─ Tail vertex (start)
├─ Set of incoming edges ├─ Head vertex (end)
└─ Properties (key-value) ├─ Label (describes relation)
└─ Properties (key-value)
Triple-Store:
(subject) ──[predicate]──→ (object)
(Jim) ──[likes]──────→ (bananas) ← object is a value
(Jim) ──[bornIn]─────→ (Idaho) ← object is a vertex
MapReduce Mental Model
Document: { ocean: "Atlantic", species: "shark", month: "Jan" }
Map function: emit(this.ocean + "-" + this.month, 1)
→ ("Atlantic-Jan", 1), ("Atlantic-Jan", 1), ...
Reduce function: sum(values)
→ ("Atlantic-Jan", 47)
Result: Counts of catches by ocean and month
Key Trade-offs
| Decision | Pro | Con | When to Use |
|---|---|---|---|
| Document model | No joins, flexible schema | Poor for many-to-many | Self-contained tree data |
| Relational model | Powerful joins, ACID | Impedance mismatch, rigid schema | Complex relationships |
| Graph model | Fast traversal | Overkill for simple data | Social graphs, recommendations |
| Schema-on-read | Flexible, easy migration | No enforcement, bugs at read time | Heterogeneous data |
| Schema-on-write | Consistent, validated | Slow migrations, rigid | Structured, uniform data |
Red Flags in Data Model Choices
❌ Using document DB for data with heavy many-to-many joins
❌ Using relational DB when objects are deeply nested trees
❌ Denormalizing data that changes frequently (stale copies everywhere)
❌ Choosing graph DB when data isn’t highly connected
❌ Ignoring the impedance mismatch (letting ORM hide all complexity)
Green Flags in Data Model Choices
✅ Match the data model to the natural shape of your data
✅ Consider access patterns before choosing model
✅ Use joins for many-to-many, documents for tree-shaped data
✅ Keep schema flexible when external systems control data shape
✅ Use multi-model DBs to avoid polyglot proliferation
Modern Additions (2026)
Multi-Model DBs:
├─ PostgreSQL + JSONB (relational + document)
├─ MongoDB Atlas (document + time series + search)
└─ Single DB for multiple data models
Vector Databases (new in 2026):
├─ Store high-dimensional embeddings for ML
├─ Similarity search (cosine, dot product)
└─ Examples: Pinecone, Weaviate, pgvector
Convergence:
├─ MongoDB added joins ($lookup)
├─ PostgreSQL added JSON support
└─ Lines between models increasingly blurred
Interview Response Templates
When Asked to Choose a Database Model
“First, I’d understand the relationships in the data. If it’s tree-shaped with mostly one-to-many, a document DB like MongoDB avoids the impedance mismatch. If it has complex many-to-many joins or reporting needs, relational is safer. For highly connected data where traversal is central, a graph DB makes sense. In 2026 I’d also consider multi-model options like PostgreSQL with JSONB to avoid operating multiple databases.”
When Asked About Schema Design
“Schema-on-write enforces consistency and catches errors early, but requires migrations for changes. Schema-on-read is more flexible for heterogeneous data. The choice depends on how uniform the data is and how often the structure changes. Either way, the schema exists—schema-on-read just moves enforcement to the application layer.”
When Asked About SQL vs NoSQL
“The better framing is: what data model fits my use case? SQL/relational excels at joins and many-to-many relationships. Document DBs excel at self-contained tree-shaped data. The historical NoSQL vs SQL debate is mostly settled—modern systems are converging and the choice is really about data model fit, not just scalability.”
Quick Revision Time: 5 minutes
Interview Prep: 15 minutes
Last Updated: 2026-04-13