Chapter 3 Cheat Sheet — Data Models and Query Languages

ddia-2e data-models cheatsheet


One-Line Summaries

ConceptOne-Liner
Relational modelTables + rows + foreign keys; joins handle all relationships
Document modelSelf-contained JSON/BSON; ideal for tree-shaped, nested data
Graph modelVertices + edges; traversal is a first-class operation
Property graphVertices and edges both carry key-value properties and labels
Triple storeAll facts as (subject, predicate, object) triples; RDF standard
Star schemaCentral fact table + surrounding dimension tables for OLAP
Snowflake schemaStar schema with normalized dimensions (sub-dimensions)
GraphQLClient-defined API queries; solves over-fetching and under-fetching
Event SourcingAppend-only event log; current state derived by replaying events
CQRSSeparate write model (commands/events) from read model (projections)
Schema-on-writeDB enforces structure at write time; ALTER TABLE to change
Schema-on-readStructure implicit; interpreted by application at read time
Impedance mismatchGap between OOP objects and relational tables
DataFrameIn-memory columnar tabular structure; pandas/Spark abstraction

Quick Model Comparison

FeatureRelationalDocumentGraphEvent Sourcing
Data shapeUniform rowsNested/heterogeneousInterconnectedSequential events
Best forMany-to-many, aggregatesOne-to-many treesTraversal queriesAudit, temporal
Join supportNative SQL JOINLimited (app-level)Traversal nativeN/A (append-only)
SchemaStrict, on-writeFlexible, on-readFlexibleImmutable events
Write patternUpdate in placeReplace documentAdd nodes/edgesAppend only
Time travelDifficultDifficultDifficultNative (replay)
ExamplesPostgreSQL, MySQLMongoDB, FirestoreNeo4j, NeptuneKafka, EventStoreDB

Data Model Decision Tree

What's the primary concern?
│
├─ Data is tree-shaped (one-to-many, few cross-references)?
│  └─ → Document model (MongoDB, Firestore, CouchDB)
│
├─ Complex many-to-many joins or strong aggregate queries?
│  └─ → Relational model (PostgreSQL, MySQL, CockroachDB)
│
├─ Relationships ARE the data; traversal is primary query?
│  └─ → Graph model (Neo4j, Amazon Neptune, Dgraph)
│
├─ Need full audit trail + temporal queries + multiple views?
│  └─ → Event Sourcing + CQRS (Kafka, Axon, EventStoreDB)
│
├─ Analytical aggregates over billions of rows?
│  └─ → Column-oriented DW with star schema (Snowflake, BigQuery)
│
└─ Flexible client-driven queries over existing backend?
   └─ → GraphQL API layer (Hasura, Apollo, PostGraphile)

Normalization vs Denormalization

Normalized (Relational)                    Denormalized (Document / OLAP)
──────────────────────                     ──────────────────────────────
city stored as ID → cities table           city name stored in every row
Pro: No duplication, easy updates          Pro: Fast reads, no joins
Pro: IDs stable even if value changes      Pro: Simple queries
Con: JOIN required for reads               Con: Duplication, costly updates
Use: OLTP, frequently changing data        Use: OLAP, read-heavy, rarely changes

Schema Trade-offs

Schema-on-write (Relational)               Schema-on-read (Document)
────────────────────────────               ─────────────────────────
DB validates every INSERT/UPDATE           Application validates at read time
ALTER TABLE to change structure            Add new fields freely, no migration
Slow migrations on large tables            Instant "migration" (just add fields)
Safer for uniform, structured data         Better for heterogeneous, evolving data
Catch errors early                         Bugs surface at read time

Cypher vs SQL: Same Query, Different Languages

Goal: Find people born in France who live in UK cities

SQL (verbose, requires self-joins):
  SELECT p.name, c.name
  FROM persons p
  JOIN locations born ON p.born_in = born.id
  JOIN locations city ON p.lives_in = city.id
  JOIN locations country ON city.country_id = country.id
  WHERE born.name = 'France' AND country.name = 'UK';

Cypher (concise, reads like the domain):
  MATCH (p:Person)-[:BORN_IN]->(:Country {name:'France'}),
        (p)-[:LIVES_IN]->(:City)-[:IN]->(:Country {name:'UK'})
  RETURN p.name

Property Graph Building Blocks

Vertex                                 Edge
──────────────────                     ──────────────────────────────
• Unique ID                            • Unique ID
• Set of outgoing edges                • Tail vertex (origin)
• Set of incoming edges                • Head vertex (destination)
• Key-value properties                 • Label (describes relationship)
                                       • Key-value properties

Example:
(Alice:Person {age:30}) -[:WORKS_AT {since:2020}]-> (Acme:Company {sector:'Tech'})

Triple Store (RDF) Structure

(subject)              (predicate)         (object)
─────────              ───────────         ────────
<jim>          ──      schema:name     ──  "Jim"          ← literal value
<jim>          ──      schema:birthPlace── <idaho>        ← vertex reference
<idaho>        ──      rdf:type        ──  schema:State   ← type assertion

Query (SPARQL):
SELECT ?person WHERE {
  ?person schema:birthPlace ?place .
  ?place rdf:type schema:EuropeanCity .
}

Event Sourcing vs CRUD

CRUD (Traditional)                     Event Sourcing
──────────────────                     ──────────────
State: { balance: 1000 }               Events:
                                         AccountOpened  { balance: 0   }
UPDATE account SET balance = 1000        MoneyDeposited { amount: 1500 }
                                         MoneyWithdrawn { amount: 500  }
                                       Current state = replay events

What was balance on Jan 12?  ✗         What was balance on Jan 12?  ✓
Why did balance change?      ✗         Why did balance change?      ✓
Time-travel queries          ✗         Time-travel queries          ✓
Audit trail                  Extra     Audit trail                  Free

CQRS Architecture

           WRITE SIDE                         READ SIDE
┌───────────────────────────┐       ┌────────────────────────────┐
│  Command: PlaceOrder      │       │  Query: GetOrderHistory    │
│  Validate → emit event    │──────▶│  Read from projection DB   │
│                           │ event │  (optimized for reads)     │
│  Event Store (Kafka)      │stream │                            │
│  append-only log          │──────▶│  Projection: OrderSummary  │
└───────────────────────────┘       │  (updated by consumer)     │
                                    └────────────────────────────┘
Key insight: Write model = correctness; Read model = performance

Star Schema Layout

                      dim_date
                     (date, year, quarter)
                          │
dim_customer ─────── fact_sales ─────── dim_product
(name, region)  │   (sale_id,         │  (name, category)
                │    customer_id,     │
                │    product_id,      │
                │    store_id,        │
                │    date_id,         │
                │    quantity,        │
                │    revenue)         │
                │                    │
            dim_store              dim_promotion
         (location, manager)     (discount, type)

GraphQL vs REST vs gRPC

DimensionRESTGraphQLgRPC
Over-fetchingCommonEliminatedNo
Multiple round tripsYesSingle queryNo (streaming)
Schema requiredOptionalYes (SDL)Yes (.proto)
HTTP cachingEasyDifficultNo
Real-timeWebSocket (extra)Subscriptions built-inStreaming built-in
Browser nativeYesYesNo (grpc-web needed)
Best forPublic CRUD APIsData-rich UIs, mobileInternal services

Key Trade-offs Reference

Model/PatternUse WhenAvoid When
DocumentTree-shaped data, schema evolutionHeavy many-to-many joins
RelationalComplex joins, reporting, consistencyDeep nesting, schema churn
GraphTraversal is primary queryFew relationships, simple CRUD
Event SourcingAudit required, temporal queriesSimple CRUD, small teams
CQRSRead/write scaling differ significantlyRead == write access patterns
Star schemaOLAP aggregations, BI toolsOLTP transactional workloads
GraphQLFlexible client data needsInternal services (use gRPC instead)

Red Flags

  • Using document DB for data with heavy many-to-many relationships
  • Using event sourcing for a simple CRUD application without audit needs
  • Storing mutable state in an event store (events must be immutable)
  • Using GraphQL when REST is simpler and the API is stable
  • Building a star schema for OLTP transactional workloads
  • Denormalizing data that changes frequently (stale duplicates everywhere)

Modern Additions (2026)

Multi-Model Convergence:
├─ PostgreSQL: relational + JSONB (doc) + pgvector + PostGIS
├─ MongoDB Atlas: document + time series + vector search
└─ Single DB handles multiple models

GraphQL in Production:
├─ Hasura / PostGraphile: auto-generate from relational schema
├─ Apollo Federation: stitch multiple services into unified graph
└─ Dominant pattern for BFF (Backend for Frontend)

Event Sourcing Platforms:
├─ Kafka (event store + streaming backbone at scale)
├─ EventStoreDB (purpose-built for event sourcing)
├─ Axon Framework (Java framework for CQRS + ES)
└─ Temporal.io / Restate (durable execution on ES principles)

Analytics Stack:
├─ dbt: SQL transformations for star/snowflake schemas
├─ Snowflake / BigQuery / Databricks: cloud DW
└─ Apache Arrow: universal columnar in-memory format

Quick Revision Time: 7 minutes
Interview Prep: 20 minutes
Last Updated: 2026-05-29