Chapter 2 Cheat Sheet - Data Models and Query Languages

One-Line Summaries

ConceptOne-Liner
Relational modelTables + rows + foreign keys; joins handle relationships
Document modelSelf-contained JSON/BSON; great for tree-shaped data
Graph modelVertices + edges; ideal for highly connected data
Schema-on-writeDB enforces structure at write time (relational)
Schema-on-readStructure interpreted at read time (document)
Declarative querySpecify what, let optimizer decide how (SQL)
Impedance mismatchGap between OOP objects and relational tables
Data localityStoring related data together for faster reads

Quick Model Comparison

FeatureRelationalDocumentGraph
Best forMany-to-manyOne-to-many treesComplex relationships
Join supportNativeLimited/app-levelTraversal
SchemaStrict (on-write)Flexible (on-read)Flexible
Query languageSQLMQL / Aggregation PipelineCypher / SPARQL
ExamplesPostgreSQL, MySQLMongoDB, CouchDBNeo4j, Amazon Neptune
ScalingVertical (traditionally)HorizontalDepends

Data Model Decision Tree

What's the shape of your data?
│
├─ Tree-like (one-to-many, no cross-refs)?
│  └─ → Document model (MongoDB, CouchDB)
│
├─ Highly interconnected, traversal matters?
│  └─ → Graph model (Neo4j, Amazon Neptune)
│
├─ Complex joins, many-to-many, reporting?
│  └─ → Relational model (PostgreSQL, MySQL)
│
└─ All of the above?
   └─ → Multi-model DB (PostgreSQL+JSONB, or polyglot persistence)

Normalization vs Denormalization

Normalized (Relational)                 Denormalized (Document)
─────────────────────────               ────────────────────────
Store city as ID → lookup table         Store city name in every record
Pro: No duplication, easy updates       Pro: Fast reads, no joins needed
Con: Joins required for reads           Con: Duplication, hard to update
Use: Data changes frequently            Use: Read-heavy, data rarely changes

Schema Trade-off

Schema-on-write (Relational)            Schema-on-read (Document)
────────────────────────────            ─────────────────────────
ALTER TABLE to change structure         Add new fields freely
DB validates on every insert            App validates at read time
Safer for consistent data               Flexible for heterogeneous data
Slow migrations on large tables         No migration needed

Query Language Comparison

Imperative (how):                       Declarative (what):
────────────────                        ────────────────────
animals = []                            SELECT * FROM animals
for (animal in db.animals):             WHERE family = 'Sharks'
  if animal.family == 'Sharks':
    animals.append(animal)

Con: Specifies exact execution          Pro: Optimizer picks best strategy
Con: Hard to parallelize                Pro: Easy to parallelize
Con: No optimizer opportunity           Pro: Future-proof

Graph Model Building Blocks

Property Graph:
Vertex ──────────────────────────────────── Edge
├─ Unique ID                                ├─ Unique ID
├─ Set of outgoing edges                    ├─ Tail vertex (start)
├─ Set of incoming edges                    ├─ Head vertex (end)
└─ Properties (key-value)                   ├─ Label (describes relation)
                                            └─ Properties (key-value)

Triple-Store:
(subject) ──[predicate]──→ (object)
(Jim)     ──[likes]──────→ (bananas)     ← object is a value
(Jim)     ──[bornIn]─────→ (Idaho)       ← object is a vertex

MapReduce Mental Model

Document: { ocean: "Atlantic", species: "shark", month: "Jan" }

Map function:   emit(this.ocean + "-" + this.month, 1)
                → ("Atlantic-Jan", 1), ("Atlantic-Jan", 1), ...

Reduce function: sum(values)
                → ("Atlantic-Jan", 47)

Result: Counts of catches by ocean and month

Key Trade-offs

DecisionProConWhen to Use
Document modelNo joins, flexible schemaPoor for many-to-manySelf-contained tree data
Relational modelPowerful joins, ACIDImpedance mismatch, rigid schemaComplex relationships
Graph modelFast traversalOverkill for simple dataSocial graphs, recommendations
Schema-on-readFlexible, easy migrationNo enforcement, bugs at read timeHeterogeneous data
Schema-on-writeConsistent, validatedSlow migrations, rigidStructured, uniform data

Red Flags in Data Model Choices

❌ Using document DB for data with heavy many-to-many joins
❌ Using relational DB when objects are deeply nested trees
❌ Denormalizing data that changes frequently (stale copies everywhere)
❌ Choosing graph DB when data isn’t highly connected
❌ Ignoring the impedance mismatch (letting ORM hide all complexity)

Green Flags in Data Model Choices

✅ Match the data model to the natural shape of your data
✅ Consider access patterns before choosing model
✅ Use joins for many-to-many, documents for tree-shaped data
✅ Keep schema flexible when external systems control data shape
✅ Use multi-model DBs to avoid polyglot proliferation

Modern Additions (2026)

Multi-Model DBs:
├─ PostgreSQL + JSONB (relational + document)
├─ MongoDB Atlas (document + time series + search)
└─ Single DB for multiple data models

Vector Databases (new in 2026):
├─ Store high-dimensional embeddings for ML
├─ Similarity search (cosine, dot product)
└─ Examples: Pinecone, Weaviate, pgvector

Convergence:
├─ MongoDB added joins ($lookup)
├─ PostgreSQL added JSON support
└─ Lines between models increasingly blurred

Interview Response Templates

When Asked to Choose a Database Model

“First, I’d understand the relationships in the data. If it’s tree-shaped with mostly one-to-many, a document DB like MongoDB avoids the impedance mismatch. If it has complex many-to-many joins or reporting needs, relational is safer. For highly connected data where traversal is central, a graph DB makes sense. In 2026 I’d also consider multi-model options like PostgreSQL with JSONB to avoid operating multiple databases.”

When Asked About Schema Design

“Schema-on-write enforces consistency and catches errors early, but requires migrations for changes. Schema-on-read is more flexible for heterogeneous data. The choice depends on how uniform the data is and how often the structure changes. Either way, the schema exists—schema-on-read just moves enforcement to the application layer.”

When Asked About SQL vs NoSQL

“The better framing is: what data model fits my use case? SQL/relational excels at joins and many-to-many relationships. Document DBs excel at self-contained tree-shaped data. The historical NoSQL vs SQL debate is mostly settled—modern systems are converging and the choice is really about data model fit, not just scalability.”


Quick Revision Time: 5 minutes
Interview Prep: 15 minutes
Last Updated: 2026-04-13