Chapter 3 Flashcards - Google Maps (Vol 2)

flashcards volume2 google-maps routing graph navigation

What is the tile URL structure for map tiles and what does each component mean?
?
URL format: /tiles/{z}/{x}/{y}.png where z = zoom level (0–22), x = tile column, y = tile row. Zoom 0 = entire world in 1 tile. Each zoom level doubles resolution (4x tiles). Zoom 14 = neighborhood, zoom 17 = street level. At zoom z, there are 4^z total tiles. Key insight: tile content is immutable — same z/x/y always returns same image.

What are zoom levels and what does each level represent for map tiles?
?
Level 0: Whole world (1 tile). Level 5: Continent. Level 10: City. Level 14: Neighborhood. Level 17: Street detail. Level 20: Individual buildings. Level 22: Room-level. Each level doubles linear resolution → 4x more tiles than previous level. ~4 trillion tiles at zoom 21, but only ~10% of Earth is usable land area.

Why are map tiles ideal for CDN caching?
?
Three properties make tiles perfect for CDN: (1) Static — road geometry rarely changes, same tile content for months. (2) Immutable URLs — same z/x/y always returns the same file version. (3) Massive reuse — millions of users request the same popular tiles (city centers). Result: cache-hit rate > 99% for popular areas; CDN serves 99% of traffic, origin (object storage) sees minimal load.

What are the trade-offs between vector tiles and raster tiles?
?
Raster (PNG/JPG): Server renders image, ~100 KB/tile, pixelates between zoom levels, fixed style, low client CPU. Vector (PBF/JSON): Client renders geometry via WebGL, ~10–50 KB/tile, smooth infinite zoom, customizable style at runtime, higher client CPU. Google Maps uses vector tiles — smaller size, smooth zooming, and runtime theming outweigh the CPU cost on modern devices.

How is the road network represented as a graph for routing?
?
Nodes = road intersections (lat/lng coordinates). Edges = road segments between intersections. Edge weight = travel time in seconds (NOT distance). Directed edges (one-way streets). Key: weight is dynamic — updated with real-time traffic. Adjacency list format stored in object storage, partitioned by geographic tile. ~100M nodes, ~200M edges globally.

Why is naive Dijkstra’s algorithm too slow for global map routing?
?
Dijkstra’s time complexity is O((V+E) log V). With V=100M nodes and E=200M edges: ~300M × log(100M) ≈ 8 billion operations per query. A New York → LA route would require exploring millions of nodes across the entire continent. This takes seconds to minutes — unacceptable for a < 2 second requirement. Dijkstra explores uniformly in all directions, wasting time on nodes in the wrong direction.

How does A* (A-star) improve on Dijkstra for routing?
?
A* adds a heuristic h(n) = straight-line distance from node n to the goal. Priority queue orders by f(n) = g(n) + h(n), where g(n) = actual cost so far. This guides the search toward the destination, avoiding exploration of nodes in wrong directions. Speedup: 2–10x over Dijkstra. Limitation: Still explores too many nodes for continent-wide routing — not sufficient alone for production.

What are Contraction Hierarchies (CH) and why are they the production standard?
?
CH pre-processes the graph offline by: (1) Ranking nodes by “importance” (highways > local roads). (2) Contracting less-important nodes by adding shortcut edges (u→w bypassing v). At query time, bidirectional search goes upward in the hierarchy from both source and goal. Result: each direction explores only ~1000 nodes instead of millions. Speedup: 1000–10,000x over Dijkstra. Used by OSRM, HERE Maps, Google Maps.

How does the Contraction Hierarchies query phase work?
?
Forward search from source S moves only upward in the hierarchy (local roads → highways → motorways). Backward search from goal G also moves only upward. Both searches meet somewhere near the top of the hierarchy (major highway nodes). The meeting point gives the optimal path. Pre-computation takes hours (run offline), but each query takes milliseconds. The contracted graph has far fewer edges to explore at each level.

How does bidirectional Dijkstra improve routing performance?
?
Run two simultaneous Dijkstra searches: forward from source S and backward from destination G. Each explores a radius r/2. Area explored = 2 × π(r/2)² = πr²/2, about half of standard Dijkstra’s πr². In practice ~4x fewer nodes explored. Simpler than CH (no pre-computation), good as a building block. Combined with CH for best results.

How does Google Maps collect real-time traffic data?
?
Passive probe model: Users navigating send anonymized GPS updates every 15 seconds. System strips user identity immediately (only temporary session token). Events flow: GPS → Location Service → Kafka → Traffic Processor. Traffic Processor aggregates median speed per road segment per 1-minute window, filtering outliers. Result: segment_id → avg_speed_kmh. No individual routes stored — only aggregate speeds. Need minimum N users on segment before reporting (privacy threshold).

What is the end-to-end traffic data pipeline?
?

  1. Mobile client sends GPS update every 15 sec via WebSocket. 2. Location Service anonymizes and publishes to Kafka (partitioned by geohash). 3. Stream Processor (Flink/Spark) groups by road segment, calculates median speed in 1-min windows, filters outliers. 4. Results written to Traffic DB (Redis/time-series). 5. Graph weight updater runs every 1–5 min: new_weight_s = segment_length_m / speed_m_s. 6. Updated graph used for subsequent route calculations.

How many GPS location events per second must the system handle?
?
Estimate: 1 billion DAU × ~10% actively navigating at peak = 100M active navigators. Each sends 1 update per 15 sec = 100M/15 ≈ 6.7 million GPS events per second. This requires Kafka for ingestion (partitioned by geohash/segment), stream processing for aggregation, and no persistence of raw events — only aggregate traffic speeds need to be stored long-term.

How is the road graph stored and partitioned for scalability?
?
The global graph (~200 GB compressed) is partitioned into geographic routing tiles at zoom level ~6 (each tile ≈ 100×100 km). Each tile stored as a binary adjacency list in object storage (S3/GCS). Route calculation only loads tiles along the approximate path (~20 tiles for cross-country). Tile boundary edges are stored in both adjacent tiles. Total: ~10,000 tiles globally, ~20 MB each. A major metro’s graph fits in ~200 GB — manageable for a cluster.

How is ETA calculated in Google Maps?
?
Basic ETA = sum of edge weights (travel time in seconds) along route. Production ETA uses an ML model with features: current traffic speed per segment (real-time), historical speed at same day/time, day of week, holidays, weather (rain slows traffic 10–20%), active incidents, route type (highway vs city). Model trained on historical route + actual arrival time pairs. Recalculated every few minutes as user progresses. Target: ±10% accuracy for 90% of routes.

Why is WebSocket used for location updates instead of REST polling?
?
WebSocket advantages: (1) Persistent TCP connection — eliminates handshake overhead for every 15-second update. (2) Bidirectional — server can push re-routing suggestions or incident alerts back to client without client polling. (3) Lower overhead than REST (no HTTP headers per update). REST polling at 15-sec intervals would create 6.7M new connections/sec globally — WebSockets keep those connections alive across the session.

What are the scale figures for Google Maps that you should know for interviews?
?
DAU: 1 billion. Roads: 5 million km globally. Road graph: ~100M nodes, ~200M edges. Map storage: 50 PB raw tile data. Average tile size: 100 KB (raster) / 10–50 KB (vector). GPS events: ~6.7 million per second. Traffic update latency: 1–3 minutes (GPS → edge weight). Routing tiles: ~10,000 globally at routing zoom level. Global routing graph compressed: ~200 GB. CDN cache-hit rate: > 99% for popular tiles.

What are the three main services in Google Maps architecture?
?

  1. Location Service: Ingests GPS updates from navigating users via WebSocket, anonymizes, publishes to Kafka. Stateless, horizontally scalable. 2. Navigation Service: Computes routes using CH on the road graph, calculates ETAs, re-routes when traffic changes. 3. Map Rendering Service: Serves pre-generated vector/raster tiles from object storage via CDN. Each service independently scalable. Traffic Processor sits downstream of Location Service as a stream processing layer.

How do you handle map updates (road changes vs traffic changes)?
?
Road geometry changes (new roads, road closures): Monthly batch update pipeline. Ingest OpenStreetMap/partner data → regenerate affected graph tiles → rebuild CH shortcuts for affected regions → push to object storage. CDN tiles invalidated and regenerated. Traffic conditions (speeds, congestion): Near real-time (1–5 min). GPS probes → Kafka → stream processor → edge weight updates. No CH rebuild needed — just weight changes. Incidents (accidents): Immediate update via separate incident feed, bypasses normal batch cycle.

How does map tile generation work? What is the rendering pipeline?
?
Raw geo data (OpenStreetMap, satellite imagery, partner data) → Data pipeline processes and layers information (roads, buildings, terrain, labels) → Rendering engine generates tiles at each zoom level → Tiles stored as files in object storage → CDN distributed globally. Vector tile pipeline: Convert geo data to protobuf format (geometry + styling metadata) → store in object storage. Client-side renderer (WebGL) draws geometry with CSS-like stylesheets. Re-render client side for different themes (night mode, satellite) without new tile downloads.

What is the passive probe model for traffic data collection?
?
Users actively navigating send GPS location updates every 15 seconds. These are used passively as traffic sensors — users don’t explicitly report traffic. The system: (1) Receives lat/lng + speed + heading. (2) Matches GPS point to nearest road segment (map matching). (3) Strips user identity. (4) Aggregates speed across all users on same segment. Privacy: minimum N users required per segment before displaying traffic data, preventing individual tracking. This scales automatically — more users = better traffic data.

How does Google Maps detect accidents or sudden traffic incidents automatically?
?
Incident detection uses anomaly detection on GPS probe data: If many users on the same road segment suddenly decelerate to near-zero speed, the stream processor detects a cluster of stopped GPS probes. This is flagged as a potential incident within 1–3 minutes. Confidence increases with more probes. The incident is pushed as a high-priority weight update (artificially very high travel time for that segment) to the routing service, triggering re-routing for active navigators on that path.

Why use object storage (S3/GCS) for both the road graph and map tiles?
?
Both are large, append-mostly datasets that are read far more than written. Object storage advantages: (1) Virtually unlimited scale (petabytes). (2) Cheap storage cost. (3) High read throughput via CDN integration. (4) No schema — store binary files, protobuf, compressed graphs. (5) Built-in geo-replication for availability. The graph is partitioned into tile files (like tiles), so object storage’s key-value model (filename → file) is a perfect fit. No need for a relational DB or graph DB for this scale.

How does route calculation handle tile boundaries in the partitioned road graph?
?
Boundary edges (road segments crossing tile boundaries) are stored redundantly in both adjacent tiles. When routing: (1) Compute approximate bounding box of source → destination. (2) Load all routing tiles intersecting that bounding box. (3) For cross-country routes, load ~20 tiles. (4) Run CH query on the in-memory union of loaded tiles. (5) If route exits loaded region unexpectedly, load neighboring tiles on demand. Tiles are small enough (~20 MB each) that loading 20 tiles (~400 MB) fits comfortably in server memory.

What data stores are used for each component of Google Maps?
?
Map tiles: Object storage (S3/GCS) + CDN (static files, no DB). Road graph: Object storage (binary tile files, adjacency lists). Traffic data (live): Redis or time-series DB (segment_id → current_speed, TTL = few minutes). Location events: Kafka (ephemeral stream, not persisted long-term). ETA training data: Data warehouse (historical routes + actual travel times for ML). User preferences / saved places: SQL or NoSQL DB. No single DB handles everything — each store is chosen for its access pattern.


Total Cards: 25
Review Time: 20–25 minutes
Priority: HIGH — Very Hard difficulty, tests graph algorithms + distributed systems + CDN
Last Updated: 2026-04-13