Write Amplification: 1 logical write → N physical writes
LSM-Tree: compaction rewrites data multiple times across levels
B-Tree: typically 1-2x (WAL + in-place page write)
High write amp → SSD wear, more I/O bandwidth consumed
Read Amplification: 1 logical read → N physical reads
LSM-Tree: check memtable + Bloom filter + N SSTable levels
B-Tree: traverse log(N) pages in one path
High read amp → slower reads, more I/O
Space Amplification: actual disk used / logical data size
LSM-Tree: up to 10x during compaction (old + new copies exist simultaneously)
B-Tree: ~2x (fragmentation, free pages)
High space amp → more storage cost
Column-Oriented Storage: The Key Insight
Row-oriented (OLTP layout):
[id | name | region | revenue | category | date | status | ...] ← 100 columns
[id | name | region | revenue | category | date | status | ...]
...
Query: SELECT SUM(revenue) WHERE region = 'EU'
Must read ALL 100 columns to get just 2
Column-oriented (OLAP layout):
revenue_file: [125.0, 890.0, 45.0, 1200.0, ...] ← only this file read
region_file: ['US', 'EU', 'EU', 'US', ...] ← and this one
Query reads 2% of data. Before compression.
After compression (dictionary + RLE on region):
region_file compressed: { 'EU': [bits 0,1,0,0,1,1,...] }
Operate directly on compressed bitmaps — no decompression needed!
Compression Techniques in Column Stores
Dictionary Encoding:
Before: ['pending','shipped','pending','delivered','shipped']
After: {0:'pending', 1:'shipped', 2:'delivered'}, [0,1,0,2,1]
Savings: ~5x for string columns with low cardinality
Run-Length Encoding (RLE) on sorted data:
Before: ['EU','EU','EU','EU','US','US','US']
After: [('EU',4), ('US',3)]
Savings: extreme for sorted low-cardinality columns
Delta Encoding (for timestamps):
Before: [1704067200, 1704067260, 1704067320, 1704067380]
After: [1704067200, +60, +60, +60]
Savings: ~4x compression on dense time series
OLTP vs OLAP Quick Comparison
Dimension
OLTP
OLAP
Users
End-users (application)
Analysts, data scientists
Query type
Short, many transactions
Long scans, aggregations
Dataset
GB, normalized (3NF)
TB-PB, star/snowflake schema
Write pattern
Individual low-latency writes
Bulk ETL/ELT loads
Index type
B-Tree (random access)
Column + bitmap + sort key
Examples
PostgreSQL, MySQL, DynamoDB
Snowflake, BigQuery, Redshift
Cloud Data Warehouse Decision Tree
Need serverless (no cluster management)?
└─ Yes → Google BigQuery (pay-per-query)
Need best separation of compute/storage + zero-copy cloning?
└─ Yes → Snowflake
Already heavily invested in AWS?
└─ Yes → Amazon Redshift (RA3 nodes)
Need unified batch + streaming + ML in one platform?
└─ Yes → Databricks Lakehouse (Delta Lake)
Need embedded analytics (no server, runs in-process)?
└─ Yes → DuckDB
Need extreme real-time insert throughput for analytics?
└─ Yes → ClickHouse
Need open table format (no vendor lock-in)?
└─ Yes → Iceberg + Trino/Athena/Spark
Vectorized Execution vs Volcano Model
Volcano model (tuple-at-a-time):
for each row in table:
if filter(row): ← function call per row
result = project(row) ← function call per row
aggregate(result) ← function call per row
→ 3 × 1,000,000,000 function calls for 1B rows
Vectorized model (batch-at-a-time):
for each batch of 1024 rows:
filtered_batch = filter(batch) ← 1 call, SIMD processes 8-32 values
projected_batch = project(filtered_batch)
aggregate(projected_batch)
→ 3 × (1,000,000,000 / 1024) = 2.9M function calls
→ Each call also 8-32x faster (SIMD)
→ Net: ~100-300x fewer instructions
Used by: DuckDB, Apache Arrow DataFusion, ClickHouse, Velox (Meta)
Vector Embeddings: Semantic Search
Traditional keyword search:
query: "database performance"
match: documents containing exact words "database" AND "performance"
Semantic vector search:
query → embed → [0.12, -0.45, 0.89, ..., 0.03] (1536 dims)
compare cosine similarity to all document embeddings
match: "storage engine optimization" (semantically close, no keyword match)
"DB query speed tuning" (semantically close)
RAG Pattern:
User question → embed → find top-5 similar chunks in vector DB
→ inject chunks into LLM prompt
→ LLM generates answer grounded in retrieved context
Lakehouse Architecture:
├─ Delta Lake (Databricks) + Apache Iceberg (Netflix/Apple)
├─ ACID transactions on Parquet files in S3/GCS
├─ Time travel, schema evolution, concurrent read/write
└─ Multiple engines (Spark, Trino, Flink, Athena) read same data
DuckDB Revolution:
├─ Columnar analytics without a server (embedded)
├─ Reads Parquet, Arrow, CSV, JSON natively
├─ 10-100x faster than pandas for medium-scale analytics
└─ Powers MotherDuck, Observable, many analytics tools
Vector Search Mainstream:
├─ pgvector + HNSW available in all managed PostgreSQL services
├─ Pinecone, Weaviate, Qdrant for dedicated vector workloads
├─ Hybrid search (vector + BM25) standard for RAG
└─ Every cloud provider now has native vector DB support
RocksDB as Universal Substrate:
├─ Underlying engine for TiKV, TiDB, CockroachDB, Kafka storage
└─ Purpose-tuned for SSD write-heavy workloads