Chapter 3 Cheat Sheet — Data Models and Query Languages

One-Line Summaries

Concept	One-Liner
Relational model	Tables + rows + foreign keys; joins handle all relationships
Document model	Self-contained JSON/BSON; ideal for tree-shaped, nested data
Graph model	Vertices + edges; traversal is a first-class operation
Property graph	Vertices and edges both carry key-value properties and labels
Triple store	All facts as (subject, predicate, object) triples; RDF standard
Star schema	Central fact table + surrounding dimension tables for OLAP
Snowflake schema	Star schema with normalized dimensions (sub-dimensions)
GraphQL	Client-defined API queries; solves over-fetching and under-fetching
Event Sourcing	Append-only event log; current state derived by replaying events
CQRS	Separate write model (commands/events) from read model (projections)
Schema-on-write	DB enforces structure at write time; `ALTER TABLE` to change
Schema-on-read	Structure implicit; interpreted by application at read time
Impedance mismatch	Gap between OOP objects and relational tables
DataFrame	In-memory columnar tabular structure; pandas/Spark abstraction

Quick Model Comparison

Feature	Relational	Document	Graph	Event Sourcing
Data shape	Uniform rows	Nested/heterogeneous	Interconnected	Sequential events
Best for	Many-to-many, aggregates	One-to-many trees	Traversal queries	Audit, temporal
Join support	Native SQL JOIN	Limited (app-level)	Traversal native	N/A (append-only)
Schema	Strict, on-write	Flexible, on-read	Flexible	Immutable events
Write pattern	Update in place	Replace document	Add nodes/edges	Append only
Time travel	Difficult	Difficult	Difficult	Native (replay)
Examples	PostgreSQL, MySQL	MongoDB, Firestore	Neo4j, Neptune	Kafka, EventStoreDB

Data Model Decision Tree

What's the primary concern?
│
├─ Data is tree-shaped (one-to-many, few cross-references)?
│  └─ → Document model (MongoDB, Firestore, CouchDB)
│
├─ Complex many-to-many joins or strong aggregate queries?
│  └─ → Relational model (PostgreSQL, MySQL, CockroachDB)
│
├─ Relationships ARE the data; traversal is primary query?
│  └─ → Graph model (Neo4j, Amazon Neptune, Dgraph)
│
├─ Need full audit trail + temporal queries + multiple views?
│  └─ → Event Sourcing + CQRS (Kafka, Axon, EventStoreDB)
│
├─ Analytical aggregates over billions of rows?
│  └─ → Column-oriented DW with star schema (Snowflake, BigQuery)
│
└─ Flexible client-driven queries over existing backend?
   └─ → GraphQL API layer (Hasura, Apollo, PostGraphile)

Normalization vs Denormalization

Normalized (Relational)                    Denormalized (Document / OLAP)
──────────────────────                     ──────────────────────────────
city stored as ID → cities table           city name stored in every row
Pro: No duplication, easy updates          Pro: Fast reads, no joins
Pro: IDs stable even if value changes      Pro: Simple queries
Con: JOIN required for reads               Con: Duplication, costly updates
Use: OLTP, frequently changing data        Use: OLAP, read-heavy, rarely changes

Schema Trade-offs

Schema-on-write (Relational)               Schema-on-read (Document)
────────────────────────────               ─────────────────────────
DB validates every INSERT/UPDATE           Application validates at read time
ALTER TABLE to change structure            Add new fields freely, no migration
Slow migrations on large tables            Instant "migration" (just add fields)
Safer for uniform, structured data         Better for heterogeneous, evolving data
Catch errors early                         Bugs surface at read time

Cypher vs SQL: Same Query, Different Languages

Goal: Find people born in France who live in UK cities

SQL (verbose, requires self-joins):
  SELECT p.name, c.name
  FROM persons p
  JOIN locations born ON p.born_in = born.id
  JOIN locations city ON p.lives_in = city.id
  JOIN locations country ON city.country_id = country.id
  WHERE born.name = 'France' AND country.name = 'UK';

Cypher (concise, reads like the domain):
  MATCH (p:Person)-[:BORN_IN]->(:Country {name:'France'}),
        (p)-[:LIVES_IN]->(:City)-[:IN]->(:Country {name:'UK'})
  RETURN p.name

Property Graph Building Blocks

Vertex                                 Edge
──────────────────                     ──────────────────────────────
• Unique ID                            • Unique ID
• Set of outgoing edges                • Tail vertex (origin)
• Set of incoming edges                • Head vertex (destination)
• Key-value properties                 • Label (describes relationship)
                                       • Key-value properties

Example:
(Alice:Person {age:30}) -[:WORKS_AT {since:2020}]-> (Acme:Company {sector:'Tech'})

Triple Store (RDF) Structure

(subject)              (predicate)         (object)
─────────              ───────────         ────────
<jim>          ──      schema:name     ──  "Jim"          ← literal value
<jim>          ──      schema:birthPlace── <idaho>        ← vertex reference
<idaho>        ──      rdf:type        ──  schema:State   ← type assertion

Query (SPARQL):
SELECT ?person WHERE {
  ?person schema:birthPlace ?place .
  ?place rdf:type schema:EuropeanCity .
}

Event Sourcing vs CRUD

CRUD (Traditional)                     Event Sourcing
──────────────────                     ──────────────
State: { balance: 1000 }               Events:
                                         AccountOpened  { balance: 0   }
UPDATE account SET balance = 1000        MoneyDeposited { amount: 1500 }
                                         MoneyWithdrawn { amount: 500  }
                                       Current state = replay events

What was balance on Jan 12?  ✗         What was balance on Jan 12?  ✓
Why did balance change?      ✗         Why did balance change?      ✓
Time-travel queries          ✗         Time-travel queries          ✓
Audit trail                  Extra     Audit trail                  Free

CQRS Architecture

           WRITE SIDE                         READ SIDE
┌───────────────────────────┐       ┌────────────────────────────┐
│  Command: PlaceOrder      │       │  Query: GetOrderHistory    │
│  Validate → emit event    │──────▶│  Read from projection DB   │
│                           │ event │  (optimized for reads)     │
│  Event Store (Kafka)      │stream │                            │
│  append-only log          │──────▶│  Projection: OrderSummary  │
└───────────────────────────┘       │  (updated by consumer)     │
                                    └────────────────────────────┘
Key insight: Write model = correctness; Read model = performance

Star Schema Layout

                      dim_date
                     (date, year, quarter)
                          │
dim_customer ─────── fact_sales ─────── dim_product
(name, region)  │   (sale_id,         │  (name, category)
                │    customer_id,     │
                │    product_id,      │
                │    store_id,        │
                │    date_id,         │
                │    quantity,        │
                │    revenue)         │
                │                    │
            dim_store              dim_promotion
         (location, manager)     (discount, type)

GraphQL vs REST vs gRPC

Dimension	REST	GraphQL	gRPC
Over-fetching	Common	Eliminated	No
Multiple round trips	Yes	Single query	No (streaming)
Schema required	Optional	Yes (SDL)	Yes (.proto)
HTTP caching	Easy	Difficult	No
Real-time	WebSocket (extra)	Subscriptions built-in	Streaming built-in
Browser native	Yes	Yes	No (grpc-web needed)
Best for	Public CRUD APIs	Data-rich UIs, mobile	Internal services

Key Trade-offs Reference

Model/Pattern	Use When	Avoid When
Document	Tree-shaped data, schema evolution	Heavy many-to-many joins
Relational	Complex joins, reporting, consistency	Deep nesting, schema churn
Graph	Traversal is primary query	Few relationships, simple CRUD
Event Sourcing	Audit required, temporal queries	Simple CRUD, small teams
CQRS	Read/write scaling differ significantly	Read == write access patterns
Star schema	OLAP aggregations, BI tools	OLTP transactional workloads
GraphQL	Flexible client data needs	Internal services (use gRPC instead)

Red Flags

Using document DB for data with heavy many-to-many relationships
Using event sourcing for a simple CRUD application without audit needs
Storing mutable state in an event store (events must be immutable)
Using GraphQL when REST is simpler and the API is stable
Building a star schema for OLTP transactional workloads
Denormalizing data that changes frequently (stale duplicates everywhere)

Modern Additions (2026)

Multi-Model Convergence:
├─ PostgreSQL: relational + JSONB (doc) + pgvector + PostGIS
├─ MongoDB Atlas: document + time series + vector search
└─ Single DB handles multiple models

GraphQL in Production:
├─ Hasura / PostGraphile: auto-generate from relational schema
├─ Apollo Federation: stitch multiple services into unified graph
└─ Dominant pattern for BFF (Backend for Frontend)

Event Sourcing Platforms:
├─ Kafka (event store + streaming backbone at scale)
├─ EventStoreDB (purpose-built for event sourcing)
├─ Axon Framework (Java framework for CQRS + ES)
└─ Temporal.io / Restate (durable execution on ES principles)

Analytics Stack:
├─ dbt: SQL transformations for star/snowflake schemas
├─ Snowflake / BigQuery / Databricks: cloud DW
└─ Apache Arrow: universal columnar in-memory format

Quick Revision Time: 7 minutes
Interview Prep: 20 minutes
Last Updated: 2026-05-29

Study Notes by Niladri & AI

Explorer

ch03-cheatsheet

Chapter 3 Cheat Sheet — Data Models and Query Languages

One-Line Summaries

Quick Model Comparison

Data Model Decision Tree

Normalization vs Denormalization

Schema Trade-offs

Cypher vs SQL: Same Query, Different Languages

Property Graph Building Blocks

Triple Store (RDF) Structure

Event Sourcing vs CRUD

CQRS Architecture

Star Schema Layout

GraphQL vs REST vs gRPC

Key Trade-offs Reference

Red Flags

Modern Additions (2026)

Graph View

Table of Contents