Chapter 3: Design Google Maps

volume2 google-maps routing graph navigation

Status: 🟩 Interview ready
Difficulty: Very Hard
Time to complete: 60 min read + practice


Overview

Google Maps is a real-time mapping and navigation platform that serves 1 billion users daily. It provides turn-by-turn navigation, real-time traffic updates, map tile rendering, ETA estimation, and location search across 5 million km of roads worldwide.

Why this matters:

  • Rare but high-signal interview question at Google, Uber, Lyft, Meta
  • Tests knowledge of graph algorithms, distributed storage, CDN strategy, and real-time data pipelines
  • Combines multiple hard sub-problems: routing, tile serving, traffic aggregation

Problem Statement

Design a system that:

  • Navigates users from point A to point B with accurate turn-by-turn directions
  • Incorporates real-time traffic data into route and ETA calculations
  • Renders map tiles at any zoom level, worldwide
  • Supports 1 billion DAU with low latency

Step 1: Requirements & Scope (5 min)

Functional Requirements

Clarifying questions:

  • Navigation only, or also search/discovery? β†’ Navigation + location search
  • Mobile and web? β†’ Yes, both
  • Real-time traffic? β†’ Yes, critical feature
  • Offline support? β†’ Out of scope for interview
  • Public transit, walking, cycling? β†’ Focus on driving navigation
  • How fresh is traffic data? β†’ Near real-time (seconds to minutes)

Scope:

  • Turn-by-turn navigation (shortest + fastest path)
  • Real-time traffic integration
  • Map rendering at multiple zoom levels
  • ETA estimation
  • Location/address search

Non-Functional Requirements

  • Scale: 1 billion DAU
  • Road data: 5 million km of roads globally
  • Map storage: 50 PB of raw map data; each tile ~100 KB
  • Accuracy: Routes must be correct; ETAs within acceptable margin
  • Low latency: Map tiles render quickly; route calculation < 2 sec
  • High availability: 99.99% uptime (people depend on navigation)
  • Traffic updates: Near real-time (aggregate and apply within ~1 min)

Scale Estimates

DAU:                        1 billion users
Average active navigation:  35 min/session
Location updates (GPS):     Every 15 sec while navigating
Location events/sec:        ~1B users Γ— 10% navigating Γ— (1/15 sec)
                           β‰ˆ 6.7 million GPS events/sec

Map tiles:
  Zoom levels: 0–22 (level 0 = whole world, level 22 = building)
  Tiles at zoom 21: 2^21 Γ— 2^21 = ~4 trillion tiles
  But only ~10% of tiles are land (usable area)
  Storage: 50 PB raw image data

Road graph:
  5M km roads β†’ ~100M nodes (intersections)
  ~200M edges (road segments)

Step 2: High-Level Design (10 min)

Core Components

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        Client (Mobile/Web)                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚ Route Request               β”‚ Location Updates
                      β–Ό                             β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  Navigation Service β”‚       β”‚   Location Service      β”‚
         β”‚  (Route Calculation)β”‚       β”‚   (GPS ingestion)       β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚                             β”‚
                    β”‚ Graph query                 β”‚ Kafka stream
                    β–Ό                             β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  Road Graph Store  β”‚       β”‚  Traffic Processor      β”‚
         β”‚  (Object Storage / β”‚       β”‚  (Speed aggregation     β”‚
         β”‚   Graph DB)        │◄──────│   per road segment)     β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  Map Tile Service  │──── CDN ──── Client
         β”‚  (Tile generation) β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Three Main Services

ServiceResponsibilityKey Data
Location ServiceIngest GPS updates from usersKafka topic per region
Navigation ServiceCompute routes, ETAsRoad graph, traffic weights
Map Rendering ServiceServe pre-rendered or vector tilesTile storage + CDN

Data Stores

Road Graph:         Object storage (S3 / GCS) β€” adjacency lists by region
Map Tiles:          Object storage + CDN (static, rarely updated)
Traffic Data:       Time-series DB or Redis β€” speed per road segment
User Location:      Kafka (stream) β†’ traffic processor (ephemeral)
ETA History:        Data warehouse for ML model training

Step 3: Deep Dive (30 min)

3.1 Map Tile System

World divided into tiles using a quadtree-like grid:

Zoom Level 0:    1 tile  (entire world 256Γ—256 px)
Zoom Level 1:    4 tiles (2Γ—2 grid)
Zoom Level 2:   16 tiles (4Γ—4 grid)
...
Zoom Level z:  4^z tiles

Each tile = 256Γ—256 pixels
Each tile URL: /tiles/{z}/{x}/{y}.png

Tile URL structure:

https://maps.cdn.example.com/tiles/15/16384/10900.png
                                    ^z  ^x     ^y
z = zoom level (15 = neighborhood level)
x = tile column
y = tile row

Zoom level reference:

Level  0: World (1 tile)
Level  5: Continent
Level 10: City
Level 14: Neighborhood
Level 17: Street level
Level 20: Building detail
Level 22: Room level

Tile caching strategy:

Client Request for tile z/x/y
         β”‚
         β–Ό
    CDN Edge Node ──── HIT ──→ Return tile (< 10 ms)
         β”‚ MISS
         β–Ό
    CDN Origin (Object Storage: S3/GCS)
         β”‚
         β–Ό
    Pre-rendered tile file (100 KB)
    Cache-Control: max-age=604800 (7 days)

Why CDN works perfectly for tiles:

  • Tiles are static (road geometry rarely changes)
  • Same tile served to millions of users at same zoom/location
  • Immutable URLs (tile content at z/x/y is always the same version)
  • Cache-hit ratio > 99% for popular areas

Vector Tiles vs Raster Tiles:

AspectRaster Tiles (PNG/JPG)Vector Tiles (PBF/JSON)
RenderingServer-rendered imageClient-rendered geometry
File size~100 KB per tile~10–50 KB per tile
Zoom smoothnessPixelates between levelsSmooth (infinite zoom)
CustomizationFixed styleStyle at runtime
Client CPULowHigher (WebGL/Canvas)
OfflineEasy to cacheEasy to cache + re-style
Used byLegacy systemsGoogle Maps, Mapbox

Google Maps uses vector tiles on modern clients for smooth rendering and smaller bandwidth.


3.2 Road Graph for Routing

Graph representation:

Nodes  = Road intersections (lat/lng coordinates)
Edges  = Road segments between intersections
Weight = Travel time in seconds (NOT distance)
         β†’ Updated dynamically with traffic data

Example graph:

     A ──(5s)── B ──(3s)── C
     β”‚           β”‚
   (8s)        (4s)
     β”‚           β”‚
     D ──(2s)── E ──(6s)── F

Nodes: A, B, C, D, E, F
Edges with weights (travel time):
  A→B: 5s    B→C: 3s    A→D: 8s
  B→E: 4s    D→E: 2s    E→F: 6s

Storage format β€” adjacency list:

{
  "node_A": {
    "lat": 37.7749, "lng": -122.4194,
    "edges": [
      {"to": "node_B", "base_weight_s": 5, "road_type": "primary"},
      {"to": "node_D", "base_weight_s": 8, "road_type": "secondary"}
    ]
  },
  "node_B": {
    "lat": 37.7755, "lng": -122.4180,
    "edges": [
      {"to": "node_C", "base_weight_s": 3, "road_type": "primary"},
      {"to": "node_E", "base_weight_s": 4, "road_type": "primary"}
    ]
  }
}

Stored in object storage (S3/GCS):

  • Partitioned by geographic region (tiles map to graph tiles)
  • Graph tile at zoom ~6 covers one city region
  • Only load relevant regions during route calculation
  • ~200 GB per major metro area’s compressed graph data

3.3 Routing Algorithms

Naive Dijkstra’s β€” Too Slow for Global Scale

Dijkstra's time complexity: O((V + E) log V)

Global road network:
  V = 100 million nodes (intersections)
  E = 200 million edges (road segments)

Computation: ~300M Γ— log(100M) β‰ˆ 300M Γ— 27 β‰ˆ 8 billion operations
Time:        Several seconds to minutes per query β†’ Unacceptable!

Why it’s too slow:

New York β†’ Los Angeles route:
  Dijkstra explores ALL nodes closer than destination
  Must explore millions of nodes across entire continent
  Cannot complete in < 2 second requirement

Algorithm 1: A* (A-Star) β€” Better with Heuristic

Key idea: Guide search with a heuristic (straight-line distance to goal)

f(n) = g(n) + h(n)

g(n) = actual cost from start to node n
h(n) = heuristic estimate from n to goal (straight-line distance)

Priority queue orders by f(n), not just g(n)
β†’ Explores nodes "toward" the destination first

Visual comparison:

Dijkstra:                    A*:
Explores in all directions   Explores mostly toward goal

     S                           S
   β—‹β—‹β—‹β—‹β—‹                       β—‹β—‹β—‹
 β—‹β—‹β—‹β—‹β—‹β—‹β—‹β—‹β—‹                   β—‹β—‹β—‹β—‹β—‹
β—‹β—‹β—‹β—‹β—‹β—‹β—‹Gβ—‹β—‹                  β—‹β—‹β—‹β—‹G
 β—‹β—‹β—‹β—‹β—‹β—‹β—‹β—‹β—‹                   β—‹
   β—‹β—‹β—‹β—‹β—‹

β—‹ = explored nodes           Much fewer explored nodes!

Speedup: 2–10x faster than Dijkstra on road networks
Limitation: Still explores too many nodes for continent-wide routing

Algorithm 2: Contraction Hierarchies (CH) β€” Production Standard

Key idea: Pre-process graph by β€œcontracting” (removing) unimportant nodes and adding shortcut edges. Query uses bidirectional search on the contracted graph.

Pre-processing phase:

Step 1: Rank all nodes by "importance"
  Importance = how many shortcut edges would be needed if this node removed
  Low-importance nodes: Dead-end streets, local roads
  High-importance nodes: Highway junctions, major intersections

Step 2: Contract nodes from least to most important
  When contracting node v:
    For each pair of neighbors (u, w) of v:
      If shortest path u→w goes through v:
        Add shortcut edge u→w with weight dist(u,v) + dist(v,w)

Step 3: Result: Hierarchical graph
  Level 0: Local streets
  Level 1: Primary roads
  Level 2: Highways
  Level 3: Major interstates / motorways

Query phase (bidirectional):

Forward search from source S:  Only upward in hierarchy
Backward search from goal G:   Only upward in hierarchy

       [Motorway nodes - top level]
      /                          \
[Highway nodes]              [Highway nodes]
    /                                    \
[S] β†’ β†’ β†’ β†’ β†’ β†’ ←meeting point← ← ← ← [G]

Each direction only explores ~1000 nodes vs millions!

Speedup: 1000–10,000x faster than naive Dijkstra
Used by: OSRM, Graphhopper, HERE Maps, most production GPS systems

Pre-computation time: Hours to days for global graph (done offline, periodically)

Algorithm 3: Bidirectional Dijkstra

Standard Dijkstra:          Bidirectional Dijkstra:

  S ──────────────► G         S ──────► ◄────── G
  Explores radius r           Each explores radius r/2
                              Total: 2 Γ— Ο€(r/2)Β² << Ο€ rΒ²
  Area: Ο€ rΒ²                  ~4x fewer nodes explored

Simple and practical, often used as first-pass filter with CH.

Algorithm Comparison

AlgorithmTime ComplexityPre-computationSpeedup vs DijkstraUsed For
DijkstraO((V+E) log V)None1xSmall graphs
A*O((V+E) log V)None2–10xMedium graphs
Bidirectional DijkstraO((V+E) log V)None~4xSimple speedup
Contraction HierarchiesO(V^0.5) queryHours (offline)1000–10000xProduction GPS

Recommendation for interview: Mention Dijkstra first, explain why it’s too slow, then propose Contraction Hierarchies as the production solution. Show you know A* as an intermediate.


3.4 Real-Time Traffic Integration

How Google Maps collects traffic data:

Passive probe model (anonymized):

[User navigating] ──GPS update every 15s──► Location Service
                                                    β”‚
                                                    β–Ό
                                              Anonymize
                                           (strip user ID)
                                                    β”‚
                                                    β–Ό
                                              Kafka Topic
                                            (location events)
                                                    β”‚
                                                    β–Ό
                                           Traffic Processor
                                       (aggregate speed per segment)
                                                    β”‚
                                                    β–Ό
                                          Road segment speed table
                                      segment_id β†’ avg_speed_kmh
                                                    β”‚
                                                    β–Ό
                                         Update graph edge weights
                               weight_s = segment_length_m / speed_m_s

Traffic data pipeline:

GPS Events (6.7M/sec)
    β”‚
    β–Ό
Kafka (partitioned by road segment ID)
    β”‚
    β–Ό
Stream Processor (Flink/Spark Streaming)
  - Group events by road segment
  - Calculate median speed over 1-minute window
  - Filter outliers (GPS errors, stationary users)
    β”‚
    β–Ό
Traffic DB (Redis / time-series DB)
  Key: segment_id:timestamp_bucket
  Val: avg_speed_kmh, sample_count, confidence
    β”‚
    β–Ό
Graph weight updater (runs every 1–5 min)
  Reads updated speeds β†’ Recalculates edge weights β†’ Writes back to graph

Speed β†’ Edge weight conversion:

def compute_edge_weight(segment_length_m, observed_speed_kmh, free_flow_speed_kmh):
    # Use observed speed if traffic data available (high confidence)
    # Fall back to free flow speed (posted speed limit) otherwise
    effective_speed = observed_speed_kmh if confidence > threshold else free_flow_speed_kmh
    speed_m_s = effective_speed * 1000 / 3600
    travel_time_s = segment_length_m / speed_m_s
    return travel_time_s

Traffic data freshness:

Data latency: GPS β†’ edge weight update β‰ˆ 1–3 minutes
Acceptable because: Traffic conditions change slowly (minutes, not seconds)
Exception: Accidents/incidents β†’ pushed via separate incident feed

3.5 ETA Calculation

Simple ETA:

ETA = Sum of edge weights along route (in seconds)
    = Sum(segment_length / current_speed) for all segments

Better ETA with ML:

Features:
  - Current traffic speed per segment (real-time)
  - Historical speed at same time/day/segment
  - Day of week, time of day, holidays
  - Weather (rain slows traffic ~10–20%)
  - Incidents (accidents, road closures)
  - Route type (highway vs city streets)

Model:
  - Gradient boosted trees or neural network
  - Trained on historical route + actual travel time pairs
  - Recalculated every few minutes as route progresses

Accuracy target: Β± 10% of actual travel time for 90% of routes

3.6 Map Partitioning for Scalability

Challenge: 5M km of roads, 100M nodes β€” can’t load entire graph in memory

Solution: Partition by geographic tiles:

World map divided into routing tiles (zoom level ~6)
Each routing tile: ~100 km Γ— 100 km area
Stored as: graph_tile_{z}_{x}_{y}.bin in object storage

Route from NYC to LA:
  1. Find all tiles along approximate path (bounding box)
  2. Load only those tiles (~20 tiles for cross-country)
  3. Run CH query on loaded subgraph
  4. If path crosses tile boundary β†’ load neighboring tile

Total tiles for global map: ~10,000 tiles at routing zoom level
Average tile size: ~20 MB compressed
Total: ~200 GB for global routing graph (fits in cluster memory)

Tile boundary connections:

Tile A ──────────│── Tile B
      [node_99] ──── [node_100]
               boundary edge stored in both tiles

3.7 Location Update Service Architecture

Mobile Client (navigating)
  β”‚ GPS update (lat, lng, speed, heading, accuracy)
  β”‚ WebSocket (persistent connection for 2-way comms)
  β–Ό
Load Balancer
  β”‚
  β–Ό
Location Service (stateless, horizontally scalable)
  β”‚ Validates, anonymizes, rate-limits
  β–Ό
Kafka (partitioned by geohash of location)
  β”‚
  β”œβ”€β”€β–Ί Traffic Processor (aggregate speed per segment)
  β”‚
  └──► Incident Detector (sudden stops = possible accident)

Why WebSocket for location updates:

  • Persistent connection avoids TCP handshake overhead per update
  • Server can push re-routing suggestions back to client
  • More efficient than polling (REST) for 15-sec update interval

Design Summary

Complete Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          Client Apps                             β”‚
β”‚              (iOS / Android / Web)                               β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚               β”‚                β”‚                β”‚
  Route Request  Location Update   Tile Request    Address Search
       β”‚               β”‚                β”‚                β”‚
       β–Ό               β–Ό                β–Ό                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Navigation  β”‚ β”‚ Location  β”‚  β”‚  CDN Edge   β”‚  β”‚  Search       β”‚
β”‚  Service     β”‚ β”‚ Service   β”‚  β”‚  (Tiles)    β”‚  β”‚  Service      β”‚
β”‚  (Routing +  β”‚ β”‚ (WebSocketβ”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β”‚  (Geocoding)  β”‚
β”‚   ETA)       β”‚ β”‚  /REST)   β”‚         β”‚ MISS    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
       β”‚               β”‚        β”‚ Object Store β”‚
       β”‚               β–Ό        β”‚ (Tile files) β”‚
       β”‚          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚          β”‚  Kafka   β”‚
       β”‚          β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
       β”‚               β”‚
       β”‚          β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚          β”‚ Traffic       β”‚
       β”‚          β”‚ Processor     β”‚
       β”‚          β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚               β”‚ Weight updates
       β–Ό               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Road Graph Store             β”‚
β”‚  (Object Storage, partitioned tiles) β”‚
β”‚  Edge weight = travel_time_s         β”‚
β”‚  Updated every 1–5 min from traffic  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Decisions Summary

DecisionChoiceReasoning
Map tilesVector tiles + CDNSmaller files, smooth zoom, high cache hit ratio
Tile storageObject storage (S3) + CDNStatic content, global distribution
Routing algorithmContraction Hierarchies1000x faster than Dijkstra, production standard
Graph storageObject storage, partitioned by tileToo large for single DB, load on demand
Traffic collectionPassive GPS probesAnonymized, scales with user base automatically
Traffic pipelineKafka + stream processorHigh throughput, decoupled from routing
Location protocolWebSocketPersistent, low overhead, bidirectional
ETAML model (historical + live traffic)Better accuracy than pure graph weights

Interview Questions & Answers

Q: Why is naive Dijkstra’s algorithm too slow for global maps?
A: With 100 million nodes and 200 million edges globally, Dijkstra’s O((V+E) log V) would require billions of operations per query. A cross-continent route forces exploration of millions of nodes, taking seconds to minutes β€” far exceeding the < 2s requirement. Production systems use Contraction Hierarchies, which pre-compute shortcuts offline to reduce query time to milliseconds.

Q: How do you handle real-time traffic without re-running pre-computation?
A: Traffic updates only change edge weights (travel time per segment), not the graph topology. CH shortcuts are built on top of the base graph structure, but the actual query can incorporate live weights. In practice, CH is periodically rebuilt (e.g., every few hours) with current traffic patterns baked in, while very recent changes are handled by adjusting weights at query time.

Q: How do you store 50 PB of map tile data efficiently?
A: Tiles are static files stored in object storage (like S3). The key insight is CDN caching β€” popular tiles (major cities, common zoom levels) have cache-hit rates above 99%, so the origin only serves a tiny fraction of requests. Tiles are served as vector data (10–50 KB vs 100 KB for raster), reducing storage and bandwidth. Less-accessed areas (rural, remote) are served on cache miss with higher latency, which is acceptable.

Q: How do you collect traffic data without tracking users?
A: Use the passive probe model. Location updates are anonymized immediately upon receipt β€” the user ID is stripped and the device is identified only by a temporary session token. The system aggregates speed data per road segment across many users, so no individual’s route is stored. Google Maps’ privacy model publishes minimum thresholds (e.g., need at least N users on a segment before showing traffic) to prevent re-identification.

Q: What happens when there’s a sudden traffic event (accident)?
A: Two mechanisms: (1) Automatic detection β€” if many GPS probes show sudden deceleration on a segment, the traffic processor flags an incident within 1–3 minutes. (2) Explicit reporting β€” users can manually report incidents in the app. Both feed a separate incident service that immediately pushes high-priority weight updates (set segment weight to very high value) to the routing service, bypassing the normal 1–5 minute batch update.


Key Takeaways

  1. Map tiles are served via CDN with URLs structured as z/x/y β€” tiles are static and cacheable, enabling > 99% CDN hit rate
  2. Vector tiles (client-rendered) beat raster tiles for file size, smooth zoom, and runtime styling β€” the modern standard
  3. Road graph uses nodes (intersections) + edges (segments) with travel time (not distance) as weight, updated dynamically
  4. Contraction Hierarchies is the production routing algorithm β€” offline pre-computation enables millisecond queries on 100M+ node graphs
  5. Real-time traffic is sourced passively from users’ GPS updates β€” 6.7M events/sec flow through Kafka β†’ stream processor β†’ edge weights
  6. Graph partitioning by geographic tile makes the 200 GB global graph manageable β€” load only the tiles along the route
  7. ETA accuracy requires ML models combining live traffic, historical patterns, day/time, and weather β€” pure graph weight summation is not enough


Practice this design! Key areas to master:

  1. Draw the tile URL structure and explain zoom levels
  2. Walk through why Dijkstra fails and how CH solves it
  3. Describe the end-to-end traffic data pipeline
  4. Explain vector vs raster tile trade-offs
  5. Discuss how to handle map updates (batch for roads, real-time for traffic)

Last Updated: 2026-04-13
Status: Very Hard β€” rehearse the routing algorithm progression (Dijkstra β†’ A* β†’ CH)