Practice Session 14: Advanced Graph Algorithms — Dijkstra, Union-Find, Topological Sort
Overview
Session 8 covered the fundamentals: BFS/DFS reachability, topological sort for cycle detection, and grid-as-graph encoding. Session 14 is the advanced layer that separates senior engineers from strong mid-level candidates: weighted shortest paths (Dijkstra and Bellman-Ford), Union-Find with full optimisations, constrained shortest paths, and deriving character-order from implicit constraints. The recurring theme is algorithm selection under constraints — negative weights vs non-negative, static graph vs streaming edges, single-source vs all-pairs, unconstrained path vs bounded-hop path. Getting the algorithm selection right and explaining the correctness argument is what interviewers at FAANG+ are testing at this level.
Algorithm Complexity Comparison — Shortest Path Algorithms
| Algorithm | Weights | Time Complexity | Space | When to Use |
|---|---|---|---|---|
| BFS (unweighted) | All equal / unit | O(V + E) | O(V) | Minimum hops; grids; word ladder; unweighted connectivity |
| Dijkstra (binary heap) | Non-negative | O((V + E) log V) | O(V + E) | Weighted graphs without negative edges; routing, maps, network cost |
| Bellman-Ford | Any (incl. negative) | O(V × E) | O(V) | Negative edge weights; detecting negative cycles; constrained hops (K-stop variant) |
| Floyd-Warshall | Any | O(V³) | O(V²) | All-pairs shortest path; dense graphs; V ≤ 500 in practice |
Decision rule: No negative weights → Dijkstra. Negative weights or need all-pairs → Bellman-Ford / Floyd-Warshall. Hop constraint K on top of weights → Bellman-Ford with exactly K+1 rounds, or Dijkstra with state (node, hops).
Union-Find Deep Dive
Union-Find (Disjoint Set Union, DSU) is a data structure for tracking which elements belong to the same set, supporting two operations: find(x) (which set is x in?) and union(x, y) (merge the sets containing x and y).
Naive Implementation — O(n) worst case
Without any optimisations, a tree can degenerate into a linked list. Finding the root of a chain of n nodes requires traversing all n parent pointers: O(n) per find, O(n) per union.
int find(int[] parent, int x) {
while (parent[x] != x) x = parent[x]; // O(n) worst case
return x;
}
void union(int[] parent, int x, int y) {
parent[find(parent, x)] = find(parent, y); // always attaches one root under the other
}Optimisation 1: Path Compression — find becomes nearly O(1)
While traversing from x to its root, make every node on that path point directly to the root. Future finds on any node in that path cost O(1). This is a lazy flattening: we don’t restructure the tree eagerly, we do it on the fly as we access nodes.
int find(int[] parent, int x) {
if (parent[x] != x)
parent[x] = find(parent, parent[x]); // every node on path now points to root
return parent[x];
}After calling find(5) in a chain 5→4→3→2→1, nodes 5, 4, 3, 2 all point directly to 1. The next find(5) costs O(1).
Optimisation 2: Union by Rank — tree height stays O(log n)
Always attach the root with the smaller “rank” (upper bound on tree height) under the root with the larger rank. If ranks are equal, attach either and increment the winner’s rank. This prevents the chain-degeneration scenario by ensuring trees stay balanced.
void union(int[] parent, int[] rank, int x, int y) {
int rx = find(parent, x), ry = find(parent, y);
if (rx == ry) return;
if (rank[rx] < rank[ry]) { parent[rx] = ry; }
else if (rank[rx] > rank[ry]) { parent[ry] = rx; }
else { parent[ry] = rx; rank[rx]++; }
}Combined: O(α(n)) ≈ O(1) amortized
With both optimisations, the amortized cost per operation is O(α(n)) where α is the inverse Ackermann function. The Ackermann function A(m, n) grows astronomically fast — faster than any tower of exponentials. Its inverse α(n) answers: “for what input size does A produce n?” Because A grows so fast, α(n) grows almost imperceptibly slowly. For any n that could arise in a physical computation — including a number larger than the number of atoms in the observable universe — α(n) ≤ 4. In practice, α(n) is treated as a constant. The combined Union-Find is thus effectively O(1) per operation in every real-world scenario.
| Configuration | find complexity | union complexity |
|---|---|---|
| Naive | O(n) worst case | O(n) worst case |
| Path compression only | O(log n) amortized | O(log n) amortized |
| Union by rank only | O(log n) | O(1) after find |
| Both combined | O(α(n)) ≈ O(1) | O(α(n)) ≈ O(1) |
Problem 1: Network Delay Time
Link: LeetCode 743
Quick Summary: Given n nodes, directed weighted edges (u, v, w), and a source k, find the minimum time for ALL nodes to receive a signal from k (i.e., the maximum of all single-source shortest path distances). Return -1 if any node is unreachable.
Industry Context: This is Dijkstra = Google Maps routing. Google Maps computes the shortest driving time from your current location to your destination over a graph of road segments with travel-time weights. At larger scale, OSPF (Open Shortest Path First) is the link-state routing protocol used inside enterprise and ISP networks — every router runs Dijkstra on the network topology it has learned from LSA (Link State Advertisement) broadcasts, computing shortest paths to all other routers. The “maximum of all shortest paths” from a single source is also used in distributed systems to compute broadcast latency — how long before all nodes in a cluster receive a message from a coordinator.
Approach 1: Bellman-Ford — O(V × E)
// O(V × E) time O(V) space
public static int networkDelayBellmanFord(int[][] times, int n, int k) {
int[] dist = new int[n + 1];
Arrays.fill(dist, Integer.MAX_VALUE);
dist[k] = 0;
// Relax ALL edges (n-1) times.
// After pass i, every node reachable via at most i hops has correct dist.
for (int pass = 0; pass < n - 1; pass++) {
boolean updated = false;
for (int[] edge : times) {
int u = edge[0], v = edge[1], w = edge[2];
if (dist[u] != Integer.MAX_VALUE && dist[u] + w < dist[v]) {
dist[v] = dist[u] + w;
updated = true;
}
}
if (!updated) break; // early exit: no improvement in this pass
}
int maxDist = 0;
for (int i = 1; i <= n; i++) {
if (dist[i] == Integer.MAX_VALUE) return -1;
maxDist = Math.max(maxDist, dist[i]);
}
return maxDist;
}- Time: O(V × E) — V-1 rounds, each processing all E edges
- Space: O(V) for the distance array
- Tradeoff: Correct but slower than Dijkstra. Use when edges might have negative weights, or when the problem specifically limits the number of edge relaxation rounds (see LC 787).
Approach 2: Dijkstra with Min-Heap — O((V+E) log V)
// O((V+E) log V) time O(V+E) space
public static int networkDelayDijkstra(int[][] times, int n, int k) {
List<List<int[]>> adj = new ArrayList<>();
for (int i = 0; i <= n; i++) adj.add(new ArrayList<>());
for (int[] edge : times) adj.get(edge[0]).add(new int[]{edge[1], edge[2]});
int[] dist = new int[n + 1];
Arrays.fill(dist, Integer.MAX_VALUE);
dist[k] = 0;
PriorityQueue<int[]> heap = new PriorityQueue<>(Comparator.comparingInt(a -> a[0]));
heap.offer(new int[]{0, k}); // [distance, node]
while (!heap.isEmpty()) {
int[] curr = heap.poll();
int d = curr[0], u = curr[1];
if (d > dist[u]) continue; // stale heap entry — skip
for (int[] neighbor : adj.get(u)) {
int v = neighbor[0], w = neighbor[1];
if (dist[u] + w < dist[v]) {
dist[v] = dist[u] + w;
heap.offer(new int[]{dist[v], v});
}
}
}
int maxDist = 0;
for (int i = 1; i <= n; i++) {
if (dist[i] == Integer.MAX_VALUE) return -1;
maxDist = Math.max(maxDist, dist[i]);
}
return maxDist;
}- Time: O((V + E) log V)
- Space: O(V + E) for adjacency list and heap
- Key Insight: Dijkstra is correct because with non-negative edge weights, once a node is popped from the min-heap, its distance is final — no future path through a not-yet-processed node can be cheaper. This greedy correctness argument relies entirely on non-negativity: a negative edge could allow a path through an unexplored node to beat the currently-known shortest path. The “stale entry” check (
if (d > dist[u]) continue) handles the fact that Java’sPriorityQueuedoes not support decrease-key, so old entries accumulate; we skip them when dequeued.
Problem 2: Number of Connected Components in Undirected Graph
Link: LeetCode 323
Quick Summary: Given n nodes and a list of undirected edges, return the number of connected components.
Industry Context: Union-Find = Kruskal’s MST in network design. When an infrastructure team designs the minimum-cost physical network topology (cables, switches), Kruskal’s algorithm greedily adds the cheapest edge that doesn’t create a cycle — using Union-Find to check “same component?” in O(α(n)) time. Connected component counting also appears in social graph analysis (counting clusters of friends, detecting isolated accounts for spam analysis), database query optimisers (detecting independent subqueries that can run in parallel), and VLSI chip design (counting disconnected circuit regions). In distributed systems, detecting network partitions — whether a cluster has split into two sub-groups that cannot communicate — is exactly a connected-components query on the live cluster topology.
Approach 1: DFS — O(V + E)
// O(V + E) time O(V + E) space
public static int countComponentsDFS(int n, int[][] edges) {
List<List<Integer>> adj = new ArrayList<>();
for (int i = 0; i < n; i++) adj.add(new ArrayList<>());
for (int[] e : edges) {
adj.get(e[0]).add(e[1]);
adj.get(e[1]).add(e[0]);
}
boolean[] visited = new boolean[n];
int components = 0;
for (int i = 0; i < n; i++) {
if (!visited[i]) {
components++;
dfsComponent(adj, visited, i);
}
}
return components;
}- Time: O(V + E)
- Space: O(V + E) adjacency list + O(V) recursion depth
- Tradeoff: Correct and straightforward. DFS requires the full graph to be known upfront; it cannot handle edges arriving as a stream without rebuilding or augmenting the adjacency list.
Approach 2: Union-Find with Path Compression + Union by Rank — O(α(V)) per edge
// O(V + E × α(V)) ≈ O(V + E) time O(V) space
public static int countComponentsUnionFind(int n, int[][] edges) {
int[] parent = new int[n];
int[] rank = new int[n];
for (int i = 0; i < n; i++) parent[i] = i; // every node is its own root
int components = n;
for (int[] e : edges) {
int rootA = find(parent, e[0]);
int rootB = find(parent, e[1]);
if (rootA != rootB) {
union(parent, rank, rootA, rootB);
components--;
}
}
return components;
}
// Path compression: every node on path-to-root now points directly to root
private static int find(int[] parent, int x) {
if (parent[x] != x) parent[x] = find(parent, parent[x]);
return parent[x];
}
// Union by rank: attach smaller tree under larger tree
private static void union(int[] parent, int[] rank, int a, int b) {
if (rank[a] < rank[b]) parent[a] = b;
else if (rank[a] > rank[b]) parent[b] = a;
else { parent[b] = a; rank[a]++; }
}- Time: O(V + E × α(V)) ≈ O(V + E)
- Space: O(V) — only parent and rank arrays; no adjacency list needed
- Key Insight: Union-Find is the correct choice over DFS when edges arrive as a stream (online algorithm) — you do not need to see all edges before answering “are these two nodes connected?” Each edge is processed in O(α(n)) ≈ O(1). The space advantage is significant: O(V) vs O(V + E) for the adjacency list. When n = 10⁹ nodes and E = 10⁹ edges (web-scale graph), this difference matters enormously.
When Union-Find is better than DFS:
- Dynamic connectivity: edges are added incrementally and you need online “same component?” queries
- Cycle detection while building a graph (Kruskal’s MST, LC 684)
- Space-constrained environments — O(V) beats O(V + E)
- Parallelism: Union-Find operations can be made concurrent with careful lock design
Problem 3: Redundant Connection
Link: LeetCode 684
Quick Summary: A tree of n nodes has one extra edge added, creating exactly one cycle. Return that extra edge.
Industry Context: Detecting cycles in distributed transactions. In a database cluster running distributed transactions, a cycle in the wait-for graph (process A waits for B, B waits for C, C waits for A) means deadlock. The system must detect and break the cycle by rolling back one transaction — and that “redundant edge” is the one whose removal resolves the deadlock. The same pattern appears in version control systems: git detects “redundant” merge commits that would create circular dependencies in the commit DAG, and in microservice dependency auditing where a DevOps tool scans service-to-service call graphs to detect unwanted circular dependencies.
Approach 1: DFS Cycle Detection — O(V²)
// O(V²) time (DFS per edge × V edges) O(V + E) space
public static int[] findRedundantConnectionDFS(int[][] edges) {
int n = edges.length;
List<List<Integer>> adj = new ArrayList<>();
for (int i = 0; i <= n; i++) adj.add(new ArrayList<>());
for (int[] edge : edges) {
int u = edge[0], v = edge[1];
boolean[] visited = new boolean[n + 1];
// If u can reach v before adding this edge, adding it creates a cycle
if (dfsReachable(adj, visited, u, v)) return edge;
adj.get(u).add(v);
adj.get(v).add(u);
}
return new int[]{};
}- Time: O(V²) — one DFS per edge, each DFS O(V + E)
- Space: O(V + E) adjacency list + O(V) recursion
- Tradeoff: Correct but quadratic. Fine for n ≤ 1000; too slow for n = 10⁵.
Approach 2: Union-Find — O(V × α(V)) ≈ O(V)
// O(V × α(V)) ≈ O(V) time O(V) space
public static int[] findRedundantConnectionUnionFind(int[][] edges) {
int n = edges.length;
int[] parent = new int[n + 1];
int[] rank = new int[n + 1];
for (int i = 0; i <= n; i++) parent[i] = i;
for (int[] edge : edges) {
int rootU = find(parent, edge[0]);
int rootV = find(parent, edge[1]);
if (rootU == rootV) return edge; // both endpoints share a root → adding creates a cycle
union(parent, rank, rootU, rootV);
}
return new int[]{};
}- Time: O(V × α(V)) ≈ O(V)
- Space: O(V)
- Key Insight: An edge (u, v) creates a cycle if and only if u and v are already in the same connected component. Union-Find answers this in O(α(n)) time. This is the exact mechanism used in Kruskal’s MST: sort edges by weight, process in order, skip any edge whose endpoints are already connected (it would form a cycle), union the rest. The redundant connection problem is essentially Kruskal’s MST in reverse — find the one edge that should be skipped.
Problem 4: Cheapest Flights Within K Stops
Link: LeetCode 787
Quick Summary: Find the cheapest path from src to dst using at most k stops (i.e., at most k+1 edges).
Industry Context: This is the airline booking engine problem. A traveller wants the cheapest flight from New York to Tokyo with at most 2 layovers. The constraint “at most K stops” is what makes it different from plain Dijkstra. The same constraint appears in network TTL (Time to Live): IP packets have a hop-count limit (TTL field, max 255 hops). Finding the minimum-cost path from a source to a destination that stays within TTL is exactly this problem. Telecommunications companies use it for multi-hop circuit path selection — find the cheapest leased-line route between cities with at most K intermediate switches.
Approach 1: Bellman-Ford with K+1 Rounds — O(K × E)
// O(K × E) time O(V) space
public static int findCheapestFlightsBellmanFord(int n, int[][] flights,
int src, int dst, int k) {
int[] dist = new int[n];
Arrays.fill(dist, Integer.MAX_VALUE);
dist[src] = 0;
for (int round = 0; round <= k; round++) {
int[] prev = dist.clone(); // CRITICAL: snapshot prevents chaining relaxations
for (int[] flight : flights) {
int from = flight[0], to = flight[1], price = flight[2];
if (prev[from] != Integer.MAX_VALUE && prev[from] + price < dist[to]) {
dist[to] = prev[from] + price;
}
}
}
return dist[dst] == Integer.MAX_VALUE ? -1 : dist[dst];
}- Time: O(K × E)
- Space: O(V)
- Key Insight: The
dist.clone()snapshot is critical. Without it, a single round could chain relaxations: A→B and B→C in the same round would reach C in 2 hops even though we’re supposedly in round 1. The snapshot ensures each round only uses distances computed from the previous round, correctly enforcing “at most round+1 edges”. This is the standard Bellman-Ford adaptation for bounded-hop shortest paths.
Approach 2: Dijkstra with State (node, stops) — O((V + E) × K × log(VK))
// O((V+E) × K × log(V×K)) time O(V × K) space
public static int findCheapestFlightsDijkstra(int n, int[][] flights,
int src, int dst, int k) {
List<List<int[]>> adj = new ArrayList<>();
for (int i = 0; i < n; i++) adj.add(new ArrayList<>());
for (int[] f : flights) adj.get(f[0]).add(new int[]{f[1], f[2]});
int[][] best = new int[n][k + 2];
for (int[] row : best) Arrays.fill(row, Integer.MAX_VALUE);
best[src][0] = 0;
PriorityQueue<int[]> heap = new PriorityQueue<>(Comparator.comparingInt(a -> a[0]));
heap.offer(new int[]{0, src, 0}); // [cost, node, stopsUsed]
while (!heap.isEmpty()) {
int[] curr = heap.poll();
int cost = curr[0], node = curr[1], stops = curr[2];
if (node == dst) return cost; // first time dst is popped = cheapest within K
if (stops > k) continue; // exceeded stop limit
for (int[] neighbor : adj.get(node)) {
int nextNode = neighbor[0], price = neighbor[1];
int newCost = cost + price;
if (newCost < best[nextNode][stops + 1]) {
best[nextNode][stops + 1] = newCost;
heap.offer(new int[]{newCost, nextNode, stops + 1});
}
}
}
return -1;
}- Time: O((V + E) × K × log(V × K))
- Space: O(V × K) for the
bestarray and heap - Key Insight: Adding
stopsto the Dijkstra state transforms a constrained problem into an unconstrained one. Conceptually, each node is replicated K+1 times — one copy for each possible stop count. A regular Dijkstra runs on this expanded (V × K)-node graph. The pruning conditionnewCost < best[nextNode][stops+1]is the key: we cannot simply skip a node because we’ve visited it at a lower cost — we might visit it again with fewer stops remaining and still reachdstcheaper via a longer intermediate route. Thebestarray tracks this.
Problem 5: Alien Dictionary
Link: LeetCode 269 — widely asked even without premium access; the problem statement is available on many mirror sites and in interview prep books.
Quick Summary: Given words sorted in an alien alphabet order, infer the character ordering from adjacent word comparisons, then output a valid topological sort of the inferred dependency graph.
Industry Context: Topological sort = Webpack module bundling, Airflow DAG scheduling. Webpack analyses ES module import statements, builds a directed dependency graph, and emits JavaScript bundles in topological order so every module’s dependencies are available before it executes. Apache Airflow schedules task DAGs: tasks are executed in topological order, respecting upstream_task >> downstream_task constraints. The alien dictionary problem is a microcosm of this: the “alien words” are the implicit constraints (like import statements), and the topological sort produces the execution order. In compiler design, instruction scheduling for CPUs is also a topological sort problem — data-dependency edges between instructions must be respected to produce a correct execution order.
Approach 1: Kahn’s BFS Topological Sort — O(C)
// O(C) time where C = total characters across all words O(U) space for U unique chars
public static String alienDictionaryKahn(String[] words) {
Map<Character, Set<Character>> adj = new LinkedHashMap<>();
Map<Character, Integer> indegree = new LinkedHashMap<>();
for (String word : words)
for (char c : word.toCharArray()) {
adj.putIfAbsent(c, new LinkedHashSet<>());
indegree.putIfAbsent(c, 0);
}
for (int i = 0; i < words.length - 1; i++) {
String w1 = words[i], w2 = words[i + 1];
// Invalid: longer word is a prefix of the next (e.g., "abc" before "ab")
if (w1.length() > w2.length() && w1.startsWith(w2)) return "";
int minLen = Math.min(w1.length(), w2.length());
for (int j = 0; j < minLen; j++) {
char c1 = w1.charAt(j), c2 = w2.charAt(j);
if (c1 != c2) {
if (!adj.get(c1).contains(c2)) {
adj.get(c1).add(c2);
indegree.put(c2, indegree.get(c2) + 1);
}
break; // only first differing char is a valid constraint
}
}
}
Queue<Character> queue = new ArrayDeque<>();
for (char c : indegree.keySet())
if (indegree.get(c) == 0) queue.offer(c);
StringBuilder result = new StringBuilder();
while (!queue.isEmpty()) {
char c = queue.poll();
result.append(c);
for (char neighbor : adj.get(c)) {
indegree.put(neighbor, indegree.get(neighbor) - 1);
if (indegree.get(neighbor) == 0) queue.offer(neighbor);
}
}
return result.length() == indegree.size() ? result.toString() : "";
}- Time: O(C) where C = total characters in all words (each character pair comparison is O(1) amortised)
- Space: O(U) where U = number of unique characters
- Tradeoff: Iterative and stack-overflow safe. Naturally detects cycles (result length < unique char count). Preferred for production implementations.
Approach 2: DFS Reverse Post-Order Topological Sort — O(C)
// O(C) time O(U) space
public static String alienDictionaryDFS(String[] words) {
// ... (build adj same as Kahn's above, omitted for brevity) ...
Map<Character, Integer> color = new HashMap<>(); // 0=unvisited,1=in-progress,2=done
StringBuilder result = new StringBuilder();
for (char c : adj.keySet())
if (color.get(c) == 0 && !dfsAlien(adj, color, result, c)) return "";
return result.reverse().toString(); // reverse post-order = topological order
}
private static boolean dfsAlien(Map<Character, Set<Character>> adj,
Map<Character, Integer> color,
StringBuilder result, char c) {
color.put(c, 1); // grey
for (char neighbor : adj.get(c)) {
if (color.get(neighbor) == 1) return false; // back-edge = cycle
if (color.get(neighbor) == 0 && !dfsAlien(adj, color, result, neighbor)) return false;
}
color.put(c, 2); // black
result.append(c); // post-order: added AFTER all successors are processed
return true;
}- Time: O(C)
- Space: O(U) + O(U) recursion depth
- Key Insight: In DFS topological sort, a node is appended to the result list only after all nodes that depend on it have been fully processed. This post-order position means the node must come before those dependents in the final order — so reversing the post-order list gives the correct topological sequence. The 3-colour scheme is mandatory: colour 1 (grey/in-progress) detects back-edges in the DFS tree, which are the only edges that can create cycles in a directed graph.
Common pitfall — extracting constraints from adjacent words: Only the first differing character between two adjacent words gives a valid ordering constraint. Subsequent character differences reveal nothing about relative order in the alien alphabet. Many wrong solutions forget the break after finding the first difference and add spurious constraints.
Key Takeaways
Algorithm Selection Guide
- “Shortest path, non-negative weights” → Dijkstra O((V+E) log V). Recognise: weighted graph,
int[][] edges, find min cost from source. - “Shortest path, negative weights possible, or bounded hops K” → Bellman-Ford. K-stop variant: run exactly K+1 rounds with a
prevsnapshot. - “All-pairs shortest path, small graph V ≤ 500” → Floyd-Warshall O(V³). Not covered here but know when to invoke it.
- “Are these nodes connected? / How many components? / Would adding this edge create a cycle?” → Union-Find O(α(n)). Must implement both path compression and union by rank.
- “Order tasks/characters/nodes with dependency constraints” → Topological Sort. Use Kahn’s BFS (iterative, cycle-safe) for production; DFS reverse post-order is elegant for small inputs.
- “Constrained shortest path (max K stops / TTL / budget)” → Expand state space: state = (node, constraint_value). Run Dijkstra or BFS on the expanded graph.
Dijkstra’s Limitation — No Negative Weights
Dijkstra’s greedy correctness relies on the invariant: once a node is popped from the min-heap, its shortest distance is final. A negative-weight edge could create a path from a not-yet-processed node that is shorter than what we thought was final, violating this invariant. If you add a negative edge to a Dijkstra run, you may get incorrect results silently — no exception is thrown. Always verify edge weights are non-negative before applying Dijkstra.
When Union-Find is Better than DFS
| Scenario | DFS/BFS | Union-Find |
|---|---|---|
| All edges known upfront, one query | Equivalent | Equivalent |
| Edges arrive as a stream, online queries | Requires rebuild | Native O(α(n)) per query |
| Space-constrained | O(V + E) adjacency list needed | O(V) only |
| Cycle detection while building graph | O(V) per edge check | O(α(n)) per edge |
| Parallel / concurrent updates | Hard (DFS is sequential) | Lock-based DSU possible |
Topological Sort Prerequisite — Must Be a DAG
Topological sort is only defined for Directed Acyclic Graphs. If the graph has a cycle, no valid topological ordering exists (every cycle node would need to come both before and after other cycle nodes). Both Kahn’s (result length < V) and DFS (back-edge to grey node) detect this condition. Always check for cycles and return an appropriate error in production — this corresponds to a circular dependency in a build system or a deadlock in a scheduler.
Pattern Recognition Triggers
- “Find shortest/fastest/cheapest path with weights” → Dijkstra (non-negative) or Bellman-Ford (general)
- “Number of connected groups / components / clusters” → Union-Find or DFS flood-fill
- “Would this edge create a cycle? / Find the cycle-creating edge” → Union-Find (same-component check)
- “Derive ordering from constraints / dependencies” → Topological Sort (Kahn’s BFS)
- “Cheapest/fastest path with at most K intermediate nodes” → Bellman-Ford K-round or Dijkstra with (node, hops) state
- “Infer ordering from sorted examples” → Build constraint graph, then topological sort
Common Pitfalls
- Dijkstra with negative weights: silently wrong — always verify non-negative weights first
- Bellman-Ford K-round without prev snapshot: chaining relaxations within the same round violates the “at most K edges” invariant
- Union-Find without path compression or union by rank: degenerates to O(n) per operation — always implement both
- Alien dictionary: not breaking after first differing character: only the first differing character gives a valid constraint; subsequent ones do not
- Alien dictionary: not checking the invalid prefix case (“abc” before “ab”): this is a cycle-inducing contradiction, return "" immediately
- Topological sort on a graph with cycles: Kahn’s will silently return a partial result; always check
result.length() == numNodes
System Design Connections
| Algorithm | System Design Analogue |
|---|---|
| Dijkstra | Google Maps routing, OSPF link-state routing protocol, CDN origin selection by latency |
| Bellman-Ford K-round | Airline booking (cheapest flight ≤ K layovers), IP TTL-constrained routing |
| Union-Find | Kruskal’s MST for network cable layout, distributed partition detection, deadlock detection in RDBMS |
| Topological Sort (Kahn’s) | Webpack/Rollup module bundling, Apache Airflow DAG scheduling, Kubernetes operator dependency ordering, database migration runners |
| Floyd-Warshall | Pre-computing all-pairs distances in small routing tables, social network “degrees of separation” queries on small graphs |