Uber Interview Prep
Interview Profile
- Format: 4–5 rounds — 2 coding, 1 system design, 1 behavioral, sometimes an additional architecture round for senior candidates
- Coding emphasis: Graph algorithms (routing, matching), real-time data processing, geospatial reasoning — Uber expects you to talk about why a particular graph algorithm fits, not just implement it
- System design: Geospatial indexing (H3/S2), driver-rider matching algorithms, dynamic pricing (surge), ETA computation pipelines
- Senior bar: Connect every algorithmic choice to Uber’s marketplace at city scale — “I chose Dijkstra over BFS here because edge weights represent time, not hop count, and ETA accuracy matters more than theoretical simplicity”
- What Uber cares about: Real-time systems with sub-second latency, geo-distributed architecture, can you reason about the ETA/matching problem for 50,000 concurrent riders in a city?
Pattern Frequency Breakdown
| Pattern | Estimated Frequency | Notes |
|---|---|---|
| Graphs (Dijkstra / BFS / Grid) | 35% | Routing, ETA, surge zone propagation |
| Heap / Priority Queue | 20% | Real-time price percentiles, driver ranking |
| Greedy / Intervals | 20% | Concurrent ride scheduling, availability windows |
| Sliding Window | 15% | Rolling price metrics, fraud detection |
| HashMap / Design | 10% | Driver pool management, dispatch data structures |
Problems From This Repo
| Problem | Why Uber Asks It |
|---|---|
[[Session-14-AdvancedGraphs|Network Delay Time (LC 743)]] | ETA calculation — single-source shortest path from a driver’s current location to all pickup candidates |
[[Session-14-AdvancedGraphs|Cheapest Flights Within K Stops (LC 787)]] | Constrained shortest path routing — find cheapest/fastest route with at most K intermediate segments; models multi-leg delivery or shared ride with transfer limit |
[[Session-08-Graphs|Pacific Atlantic Water Flow (LC 417)]] | Surge zone propagation — identify cells reachable from two different “pricing pressure” sources; models how surge radiates outward from demand hotspots |
[[Session-07-Heap|Find Median from Data Stream (LC 295)]] | Real-time price percentile — streaming p50 of ride fares for surge multiplier calibration |
[[Session-07-Heap|IPO / Maximum Capital (LC 502)]] | Driver incentive optimization — greedily pick the highest-return incentive programs within a capital/time budget |
[[Session-12-Greedy|Meeting Rooms II (LC 253)]] | Concurrent ride scheduling — minimum number of drivers to satisfy all overlapping ride windows |
[[Session-12-Greedy|Jump Game II (LC 45)]] | Fewest-stop routing — minimum number of “relay zones” to move supply from one end of a city to another |
[[Session-04-HashMap|Insert Delete GetRandom O(1) (LC 380)]] | Driver pool random dispatch — O(1) insert (driver comes online), O(1) delete (driver goes offline), O(1) random (dispatch a random available driver) |
[[Session-14-AdvancedGraphs|Redundant Connection (LC 684)]] | Zone boundary deduplication — detect a redundant road segment in a zone graph that would create a routing cycle |
[[Session-08-Graphs|Word Ladder (LC 127)]] | BFS on implicit state-space graph — models state transitions in routing algorithms where nodes are implicitly defined |
New Problems (Uber-Specific)
1. LC 1584 — Min Cost to Connect All Points (Medium)
Link: LeetCode 1584
Quick Summary: Given points in a 2D plane, find the minimum cost to connect all points where cost of connecting two points is their Manhattan distance.
Industry Context: Driver zone coverage — minimum total road length to connect all driver staging areas into a single reachable network. Geofence construction — minimum perimeter to link all GPS anchor points. MST is a gap in the current repo; this is the entry point for Prim’s and Kruskal’s.
Key Insight: This is a Minimum Spanning Tree (MST) problem on a complete graph with Manhattan distance weights. Two canonical approaches: Prim’s (greedy expansion from a seed node using a min-heap) and Kruskal’s (sort all edges, add non-cycle edges using Union-Find). On a dense graph (n² edges), Prim’s with a heap is O(n² log n) but can be optimized to O(n²) by tracking minDist per node without a heap. Kruskal’s requires sorting O(n²) edges.
Approach 1: Kruskal’s — Sort all edges + Union-Find
// Time: O(n² log n) Space: O(n²) — O(n²) edges to sort
class MinCostConnectPointsKruskal {
// Union-Find with path compression + union by rank
private int[] parent, rank;
private int find(int x) {
if (parent[x] != x) parent[x] = find(parent[x]); // path compression
return parent[x];
}
private boolean union(int x, int y) {
int px = find(x), py = find(y);
if (px == py) return false; // already connected — would form cycle
if (rank[px] < rank[py]) { int t = px; px = py; py = t; }
parent[py] = px;
if (rank[px] == rank[py]) rank[px]++;
return true;
}
public int minCostConnectPoints(int[][] points) {
int n = points.length;
parent = new int[n];
rank = new int[n];
for (int i = 0; i < n; i++) parent[i] = i;
// Build all O(n²) edges
List<int[]> edges = new ArrayList<>(); // [cost, i, j]
for (int i = 0; i < n; i++)
for (int j = i + 1; j < n; j++) {
int dist = Math.abs(points[i][0] - points[j][0])
+ Math.abs(points[i][1] - points[j][1]);
edges.add(new int[]{dist, i, j});
}
edges.sort((a, b) -> a[0] - b[0]); // sort by cost ascending
int totalCost = 0, edgesUsed = 0;
for (int[] edge : edges) {
if (union(edge[1], edge[2])) {
totalCost += edge[0];
if (++edgesUsed == n - 1) break; // MST complete
}
}
return totalCost;
}
}- Time: O(n² log n) — dominated by sorting n² edges
- Space: O(n²) — storing all edges
- Tradeoff: Intuitive — Kruskal’s is easy to explain. But O(n²) space is costly for large n. Use Prim’s when the graph is dense.
Approach 2: Optimal — Prim’s with minDist array (no heap needed on dense graph)
// Time: O(n²) Space: O(n) — optimal for dense/complete graphs
class MinCostConnectPoints {
public int minCostConnectPoints(int[][] points) {
int n = points.length;
int[] minDist = new int[n]; // min cost to connect node i to current MST
boolean[] inMST = new boolean[n];
Arrays.fill(minDist, Integer.MAX_VALUE);
minDist[0] = 0; // start from node 0
int totalCost = 0;
for (int iter = 0; iter < n; iter++) {
// Greedily pick the non-MST node with minimum connection cost
int u = -1;
for (int i = 0; i < n; i++)
if (!inMST[i] && (u == -1 || minDist[i] < minDist[u])) u = i;
inMST[u] = true;
totalCost += minDist[u];
// Update minDist for all non-MST nodes based on newly added node u
for (int v = 0; v < n; v++) {
if (!inMST[v]) {
int dist = Math.abs(points[u][0] - points[v][0])
+ Math.abs(points[u][1] - points[v][1]);
minDist[v] = Math.min(minDist[v], dist);
}
}
}
return totalCost;
}
}- Time: O(n²) — n iterations, each scanning n nodes
- Space: O(n) — only
minDistandinMSTarrays - Key Insight: On a complete graph, Prim’s with a heap is O(n² log n) because you’d process O(n²) heap entries. By instead maintaining a
minDist[v]array and doing a linear scan each iteration, you get O(n²) time with O(n) space — optimal for dense graphs. This is the “dense Prim’s” variant; use heap-based Prim’s only on sparse graphs. - Senior talking point: In Uber’s geospatial context, MST gives the “minimum total road coverage” but not necessarily the shortest path between two specific points. Distinguish MST (global minimum connectivity) from SSSP (local point-to-point). H3 hexagonal grid indexing clusters nearby drivers into cells — MST on cell centroids approximates the minimum coverage backbone.
2. LC 778 — Swim in Rising Water (Hard)
Link: LeetCode 778
Quick Summary: Given an n×n grid where grid[i][j] is the elevation, find the minimum time T such that there’s a path from (0,0) to (n-1,n-1) where every cell on the path has elevation ≤ T.
Industry Context: Dynamic road passability — at what time does a path open up as flood waters (or surge zones) expand? Models Uber’s “route becomes available as surge recedes” or road quality metrics where lower quality roads only become acceptable at a certain threshold.
Key Insight: This is a shortest-path problem where the “cost” of a path is the maximum elevation encountered, not the sum. Use Dijkstra where the edge weight between adjacent cells is max(grid[r][c], grid[nr][nc]) and we want to minimize the max weight along the path. Alternatively: binary search on T + BFS to check feasibility.
Approach 1: Binary Search + BFS
// Time: O(n² log n) Space: O(n²)
class SwimInRisingWaterBFS {
private static final int[][] DIRS = {{0,1},{0,-1},{1,0},{-1,0}};
public int swimInWater(int[][] grid) {
int n = grid.length;
int lo = grid[0][0], hi = n * n - 1;
while (lo < hi) {
int mid = (lo + hi) / 2;
if (canReach(grid, n, mid)) hi = mid;
else lo = mid + 1;
}
return lo;
}
private boolean canReach(int[][] grid, int n, int t) {
if (grid[0][0] > t) return false;
boolean[][] visited = new boolean[n][n];
Queue<int[]> queue = new LinkedList<>();
queue.offer(new int[]{0, 0});
visited[0][0] = true;
while (!queue.isEmpty()) {
int[] cur = queue.poll();
if (cur[0] == n - 1 && cur[1] == n - 1) return true;
for (int[] d : DIRS) {
int nr = cur[0] + d[0], nc = cur[1] + d[1];
if (nr >= 0 && nr < n && nc >= 0 && nc < n
&& !visited[nr][nc] && grid[nr][nc] <= t) {
visited[nr][nc] = true;
queue.offer(new int[]{nr, nc});
}
}
}
return false;
}
}- Time: O(n² log n) — log n binary search iterations, each O(n²) BFS
- Space: O(n²)
- Tradeoff: Conceptually clean — binary search on the answer is a first-principles approach. But Dijkstra is tighter and avoids the outer binary search loop.
Approach 2: Optimal — Dijkstra with min-max path weight
// Time: O(n² log n) Space: O(n²)
// Same asymptotic as BS+BFS but with a smaller constant — only one pass
import java.util.*;
class SwimInRisingWater {
private static final int[][] DIRS = {{0,1},{0,-1},{1,0},{-1,0}};
public int swimInWater(int[][] grid) {
int n = grid.length;
// [maxElevationSoFar, row, col]
PriorityQueue<int[]> pq = new PriorityQueue<>((a, b) -> a[0] - b[0]);
boolean[][] visited = new boolean[n][n];
pq.offer(new int[]{grid[0][0], 0, 0});
while (!pq.isEmpty()) {
int[] cur = pq.poll();
int t = cur[0], r = cur[1], c = cur[2];
if (visited[r][c]) continue;
visited[r][c] = true;
if (r == n - 1 && c == n - 1) return t; // reached destination
for (int[] d : DIRS) {
int nr = r + d[0], nc = c + d[1];
if (nr >= 0 && nr < n && nc >= 0 && nc < n && !visited[nr][nc]) {
// cost = max elevation seen on path so far
int nextT = Math.max(t, grid[nr][nc]);
pq.offer(new int[]{nextT, nr, nc});
}
}
}
return -1; // unreachable (won't happen for valid input)
}
}- Time: O(n² log n) — each of n² cells processed once; heap operations O(log n²) = O(log n)
- Space: O(n²) — visited array + heap
- Key Insight: Dijkstra works here because the “distance” metric is the bottleneck (max) along the path, which is still monotonically non-decreasing as we expand outward. We’re minimizing the maximum edge weight encountered — a “minimax path” variant. This pattern appears whenever the question is “what’s the minimum worst-case condition on a path?”
- Senior talking point: Compare to LC 1631 (Path With Minimum Effort) which also uses this Dijkstra min-max pattern. In Uber’s context: “minimum time before a route becomes passable” is the same computation as “minimum worst-road-quality route” — the abstraction is identical.
3. LC 218 — The Skyline Problem (Hard)
Link: LeetCode 218
Quick Summary: Given a list of buildings [left, right, height], return the key points of the skyline outline — the points where the visible profile changes height.
Industry Context: Geofence contour computation — given a set of overlapping surge zones (rectangles with surge intensity as height), compute the outer boundary contour for rendering on the Uber driver app. Also: rendering building outlines in mapping tiles (Uber Maps / Google Maps).
Key Insight: Sweep line from left to right, processing “enter” and “exit” events. Maintain a max-heap (or TreeMap<height, count>) of active building heights. A skyline point is emitted whenever the current maximum height changes.
Approach 1: Brute Force — column-by-column sweep
// Time: O(max_x * n) Space: O(max_x)
// Only feasible for small coordinate ranges — included to motivate the event-based approach
class SkylineBrute {
public List<List<Integer>> getSkyline(int[][] buildings) {
if (buildings.length == 0) return new ArrayList<>();
int maxX = 0;
for (int[] b : buildings) maxX = Math.max(maxX, b[1]);
int[] heights = new int[maxX + 1];
for (int[] b : buildings)
for (int x = b[0]; x < b[1]; x++)
heights[x] = Math.max(heights[x], b[2]);
List<List<Integer>> result = new ArrayList<>();
int prevH = 0;
for (int x = 0; x <= maxX; x++) {
if (heights[x] != prevH) {
result.add(Arrays.asList(x, heights[x]));
prevH = heights[x];
}
}
return result;
}
}- Time: O(max_x · n) — impractical for large coordinates
- Space: O(max_x)
- Tradeoff: Shows the core intuition (profile changes where max height changes) but is unusable for real coordinate ranges.
Approach 2: Optimal — Sweep Line + TreeMap (height → count)
// Time: O(n log n) Space: O(n)
import java.util.*;
class Skyline {
public List<List<Integer>> getSkyline(int[][] buildings) {
// Encode events: (x, -height) for building start (negative = start before end)
// (x, +height) for building end
List<int[]> events = new ArrayList<>();
for (int[] b : buildings) {
events.add(new int[]{b[0], -b[2]}); // start: negative height
events.add(new int[]{b[1], b[2]}); // end: positive height
}
// Sort by x; at same x: starts before ends (negative before positive),
// taller starts first (more negative first), lower ends first
events.sort((a, b) -> a[0] != b[0] ? a[0] - b[0] : a[1] - b[1]);
// TreeMap<height, count> — current active building heights
TreeMap<Integer, Integer> active = new TreeMap<>();
active.put(0, 1); // ground level always present (sentinel)
List<List<Integer>> result = new ArrayList<>();
int prevMaxH = 0;
for (int[] event : events) {
int x = event[0], h = event[1];
if (h < 0) {
// Building starts: add height to active set
active.merge(-h, 1, Integer::sum);
} else {
// Building ends: remove height from active set
active.merge(h, -1, Integer::sum);
if (active.get(h) == 0) active.remove(h);
}
int curMaxH = active.lastKey(); // O(log n) — TreeMap max
if (curMaxH != prevMaxH) {
result.add(Arrays.asList(x, curMaxH));
prevMaxH = curMaxH;
}
}
return result;
}
}- Time: O(n log n) — sort O(n log n) + n events each with O(log n) TreeMap op
- Space: O(n) — events list + TreeMap holding at most n distinct heights
- Key Insight: The
TreeMap<height, count>(instead of a simple max-heap) handles building exit events in O(log n) by directly removing the height by key. A plainPriorityQueuecan’t efficiently remove an arbitrary element — it can only remove the top. The count-based multiset (height → count) gracefully handles multiple buildings at the same height. - Senior talking point: The sort tiebreaker (
events.sort(...)using negative for starts) is the entire correctness crux — at the same x coordinate, starts must be processed before ends to avoid falsely emitting a zero-height point between two adjacent buildings of the same height. Explain this tiebreaker explicitly in the interview.
4. LC 759 — Employee Free Time (Hard)
Link: LeetCode 759
Quick Summary: Given a list of employees, each with a sorted list of busy intervals, return the sorted list of finite intervals during which all employees are free.
Industry Context: Driver availability computation — given K drivers’ shift schedules, find the time windows when no driver is available (supply gap detection). Direct extension of Meeting Rooms II (LC 253) (in repo) to a multi-source, multi-list case. Also: finding common free windows for cross-team meeting scheduling at scale.
Key Insight: Two approaches — (1) Flatten all intervals into one list, sort, then find gaps between merged intervals. (2) k-way merge using a min-heap (like Merge K Sorted Lists) to process intervals in order without flattening, more memory-efficient when K is large.
Approach 1: Flatten + Sort + Merge
// Time: O(N log N) Space: O(N) — N = total intervals across all employees
import java.util.*;
class EmployeeFreeTimeFlatten {
public List<int[]> employeeFreeTime(List<List<int[]>> schedule) {
// Flatten all intervals
List<int[]> all = new ArrayList<>();
for (List<int[]> emp : schedule) all.addAll(emp);
all.sort((a, b) -> a[0] - b[0]); // sort by start time
List<int[]> result = new ArrayList<>();
int[] merged = all.get(0);
for (int[] interval : all) {
if (interval[0] <= merged[1]) {
// Overlapping or adjacent — extend merged end
merged[1] = Math.max(merged[1], interval[1]);
} else {
// Gap found — [merged.end, interval.start] is free time
result.add(new int[]{merged[1], interval[0]});
merged = interval;
}
}
return result;
}
}- Time: O(N log N) — N = total intervals; dominated by sort
- Space: O(N) — flattened list
Approach 2: Optimal — k-way Merge with Min-Heap (streaming, O(K log K) memory)
// Time: O(N log K) Space: O(K) — K = number of employees
// Better when K << N (many intervals per employee)
import java.util.*;
class EmployeeFreeTime {
// Entry: [intervalStart, intervalEnd, empIdx, intervalIdx]
public List<int[]> employeeFreeTime(List<List<int[]>> schedule) {
// Min-heap ordered by interval start time
PriorityQueue<int[]> pq = new PriorityQueue<>((a, b) -> {
List<int[]> aList = schedule.get(a[0]);
List<int[]> bList = schedule.get(b[0]);
return aList.get(a[1])[0] - bList.get(b[1])[0];
});
// Seed heap with each employee's first interval
for (int i = 0; i < schedule.size(); i++)
if (!schedule.get(i).isEmpty())
pq.offer(new int[]{i, 0});
List<int[]> result = new ArrayList<>();
int prevEnd = -1;
while (!pq.isEmpty()) {
int[] top = pq.poll();
int[] interval = schedule.get(top[0]).get(top[1]);
if (prevEnd != -1 && interval[0] > prevEnd) {
// Gap between prevEnd and current interval start
result.add(new int[]{prevEnd, interval[0]});
}
prevEnd = Math.max(prevEnd, interval[1]);
// Advance this employee's pointer
int nextIdx = top[1] + 1;
if (nextIdx < schedule.get(top[0]).size())
pq.offer(new int[]{top[0], nextIdx});
}
return result;
}
}- Time: O(N log K) — N intervals total, each processed once with O(log K) heap op
- Space: O(K) — heap holds one interval per employee
- Key Insight: The k-way merge avoids loading all intervals into memory at once. When K drivers each have thousands of shifts, the heap-based approach uses O(K) working memory instead of O(N). This is the streaming variant — same pattern as Merge K Sorted Lists (LC 23, in repo) applied to interval merging.
- Senior talking point: LC 253 (Meeting Rooms II) finds the peak overlap count using a single sorted list. This problem finds gaps in the union of K sorted lists using a k-way merge. The connection: LC 253 → finding when resource demand exceeds supply; LC 759 → finding when supply is zero (all drivers busy). Both are interval-sweep problems with different aggregation semantics.
5. LC 1631 — Path With Minimum Effort (Medium)
Link: LeetCode 1631
Quick Summary: Given a 2D grid of heights, find a path from (0,0) to (m-1,n-1) that minimizes the maximum absolute difference in heights between consecutive cells.
Industry Context: Route quality where the “worst road segment” determines driver comfort or vehicle suitability. In Uber’s routing engine, a path through streets with large elevation changes causes slower speeds and higher fuel costs — the bottleneck segment dominates. Also: network path selection where the slowest link determines effective throughput (weakest-link optimization).
Key Insight: Dijkstra where the “cost” to reach a cell is max(cost to reach previous cell, abs(height difference)). We minimize the maximum height difference — a bottleneck path. Alternatively: binary search on the effort threshold + BFS/DFS to check feasibility (same pattern as LC 778).
Approach 1: Binary Search + BFS
// Time: O(m * n * log(maxDiff)) Space: O(m * n)
class PathMinEffortBFS {
private static final int[][] DIRS = {{0,1},{0,-1},{1,0},{-1,0}};
public int minimumEffortPath(int[][] heights) {
int m = heights.length, n = heights[0].length;
int lo = 0, hi = 1_000_000; // max possible diff = 10^6
while (lo < hi) {
int mid = (lo + hi) / 2;
if (canReach(heights, m, n, mid)) hi = mid;
else lo = mid + 1;
}
return lo;
}
private boolean canReach(int[][] heights, int m, int n, int effort) {
boolean[][] visited = new boolean[m][n];
Queue<int[]> queue = new LinkedList<>();
queue.offer(new int[]{0, 0});
visited[0][0] = true;
while (!queue.isEmpty()) {
int[] cur = queue.poll();
int r = cur[0], c = cur[1];
if (r == m - 1 && c == n - 1) return true;
for (int[] d : DIRS) {
int nr = r + d[0], nc = c + d[1];
if (nr >= 0 && nr < m && nc >= 0 && nc < n && !visited[nr][nc]) {
int diff = Math.abs(heights[nr][nc] - heights[r][c]);
if (diff <= effort) {
visited[nr][nc] = true;
queue.offer(new int[]{nr, nc});
}
}
}
}
return false;
}
}- Time: O(m · n · log(maxDiff)) — log iterations of O(m·n) BFS
- Space: O(m · n)
Approach 2: Optimal — Dijkstra min-max (single pass)
// Time: O(m * n * log(m * n)) Space: O(m * n)
import java.util.*;
class PathMinEffort {
private static final int[][] DIRS = {{0,1},{0,-1},{1,0},{-1,0}};
public int minimumEffortPath(int[][] heights) {
int m = heights.length, n = heights[0].length;
int[][] effort = new int[m][n]; // min bottleneck effort to reach (r,c)
for (int[] row : effort) Arrays.fill(row, Integer.MAX_VALUE);
effort[0][0] = 0;
// [effortSoFar, row, col]
PriorityQueue<int[]> pq = new PriorityQueue<>((a, b) -> a[0] - b[0]);
pq.offer(new int[]{0, 0, 0});
while (!pq.isEmpty()) {
int[] cur = pq.poll();
int e = cur[0], r = cur[1], c = cur[2];
if (r == m - 1 && c == n - 1) return e;
if (e > effort[r][c]) continue; // stale entry
for (int[] d : DIRS) {
int nr = r + d[0], nc = c + d[1];
if (nr >= 0 && nr < m && nc >= 0 && nc < n) {
int diff = Math.abs(heights[nr][nc] - heights[r][c]);
int newEffort = Math.max(e, diff); // bottleneck = max diff
if (newEffort < effort[nr][nc]) {
effort[nr][nc] = newEffort;
pq.offer(new int[]{newEffort, nr, nc});
}
}
}
}
return 0;
}
}- Time: O(m · n · log(m · n)) — each cell processed once; heap op O(log(m·n))
- Space: O(m · n)
- Key Insight: The Dijkstra relaxation condition changes from
dist[u] + w(u,v) < dist[v](standard sum) tomax(effort[u], |diff|) < effort[v](bottleneck). The monotonicity property still holds — the bottleneck can only stay the same or increase as we move further from the source — so Dijkstra’s greedy expansion is still correct. - Senior talking point: This is the “minimax path” variant. LC 778 (Swim in Rising Water, this sheet) uses the same Dijkstra pattern but with a different edge encoding (
max(cur_time, cell_elevation)instead ofmax(cur_effort, |diff|)). Recognizing this structural equivalence and explaining it under pressure is a strong senior signal. The unifying insight: “Dijkstra generalizes to any monotone path-cost function, not just additive sums.”
System Design Connections
| Pattern | System Design Problem | Uber Mapping |
|---|---|---|
| MST (Prim’s / Kruskal’s) | Zone Coverage Planning | Minimum road network to connect all driver staging areas; H3 cell cluster backbone |
| Dijkstra (SSSP) | ETA Computation | Driver-to-pickup shortest time; maps OSRM routing engine internals |
| Dijkstra min-max (Bottleneck) | Route Quality Selection | Minimize worst road segment on a given route; vehicle type routing (UberXL avoids low bridges) |
| Sweep Line + Max-Heap | Surge Zone Rendering | Surge boundary contour for app overlay; geofence event processing |
| K-way Merge Heap | Driver Availability Aggregation | Merge K drivers’ shift schedules to find supply gaps; streaming availability service |
| Two Heaps (Median) | Real-Time Pricing | Streaming p50 / p95 fare percentile for surge multiplier calibration |
| BFS on Grid | Service Area Propagation | Identify all H3 cells within T minutes of a driver’s position |
2-Week Sprint Plan
| Day | Focus | Problems |
|---|---|---|
| 1 | Graph SSSP warm-up | LC 743 (from repo) — ETA baseline |
| 2 | Dijkstra variants | LC 787 (from repo) — constrained routing |
| 3 | MST — both algorithms | LC 1584 (this sheet) — zone coverage |
| 4 | Dijkstra on grid | LC 778 (this sheet) — passability |
| 5 | Dijkstra min-max | LC 1631 (this sheet) — bottleneck routing |
| 6 | Intervals — multi-list | LC 253 (from repo), LC 759 (this sheet) |
| 7 | Sweep line | LC 218 (this sheet) — surge contours |
| 8 | Real-time median | LC 295 (from repo), review LC 480 (Amazon sheet) |
| 9 | Graph — BFS variants | LC 417 (from repo) — surge propagation |
| 10 | HashMap Design | LC 380 (from repo) — driver pool |
| 11 | Mock coding round 1 | Dijkstra problem timed (40 min) + debrief |
| 12 | System design sketch | Design Uber ETA service (SSSP + caching) |
| 13 | Mock coding round 2 | 1 grid + 1 interval problem (60 min) |
| 14 | System design mock | Design Uber surge pricing service |
Senior-Level Talking Points
On Routing Algorithms
- Always clarify edge weight semantics before choosing an algorithm: “Are we minimizing time, distance, or fare? BFS is only correct for unweighted graphs — as soon as edges have weights, we need Dijkstra.”
- Dijkstra assumes non-negative edge weights. For time-based routing, this is always true. For “profit” or “incentive” graphs, negative edges can appear — mention Bellman-Ford as the fallback.
- For real-time ETA at Uber’s scale: precompute contraction hierarchies (CH) on the road graph offline, then bidirectional Dijkstra at query time. This reduces query time from O(E log V) to near O(1) for common city routes.
On Geospatial Indexing
- H3 hexagonal grid: Uber’s open-source geospatial indexing. Each hexagon has a unique ID; nearby hexagons share a prefix. Driver location → H3 cell ID in O(1); range query = union of neighboring cells. Explain this when asked “how do you efficiently find nearby drivers?”
- S2 library (Google Maps): spherical geometry, recursive quadtree subdivision. Similar prefix-based locality.
- When discussing MST for zone coverage: “In production we’d run MST on H3 cell centroids at resolution 7–9, not on raw GPS coordinates — this bounds n to the number of active cells, not the number of GPS points.”
On Real-Time Systems
- Two-heap median (LC 295 / LC 480): “In production, we’d use a Count-Min Sketch or T-Digest approximation for p99 at Uber’s throughput. The exact two-heap approach is correct for interview purposes and for moderate stream sizes (< 10M events/second per partition).”
- Sweep line for surge zones: “The sweep line processes events one at a time — this maps naturally to a Kafka consumer processing location events in order. The
TreeMap<height, count>becomes a sorted set maintained in Redis or a rolling Flink window.” - K-way merge for driver availability: “This is exactly how distributed systems merge sorted log streams from multiple Kafka partitions — each partition is one sorted list, and the heap gives the globally ordered merge.”
On Connecting to Uber’s Business
- Frame every algorithmic choice in terms of marketplace impact: “Dijkstra gives us the accurate ETA, which directly drives conversion rate — inaccurate ETAs cause rider cancellations.”
- Surge pricing connection: “The two-heap median tracks the p50 fare; surge multiplier = f(demand/supply ratio). The sliding window is the ‘last 5 minutes’ of completed rides — LC 480 is the exact underlying computation.”
- Driver matching: “LC 380 (Insert Delete GetRandom) is the driver pool. The matching engine samples a random available driver within a geohash cell — random dispatch with O(1) operations is the correct abstraction.”