Practice Session 7: Heap / Priority Queue
Overview
A heap (implemented in Java as PriorityQueue) provides O(log n) insert and O(log n) extract-min/max, with O(1) peek at the extreme element. This single guarantee unlocks three canonical interview patterns: top-K selection (min-heap of size k avoids sorting the entire array), stream median (two-heap partitioning gives O(1) median at any point), and k-way merge (heap over k heads eliminates the O(k) per-step linear scan). At senior level the goal is to articulate why the heap beats sorting in streaming/online contexts and to connect these patterns to production scheduler, analytics, and distributed-systems architectures.
Problem 1: Kth Largest Element in an Array
Link: LeetCode 215
Quick Summary: Given an integer array and an integer k, return the kth largest element — not the kth distinct element.
Industry Context: Real-time leaderboards (gaming, e-commerce “Top 10 sellers this hour”) where the dataset grows continuously and you cannot afford O(n log n) re-sort on every new entry. Recommendation engines maintaining a rolling top-K of candidate items scored by a model also apply this pattern — only the heap boundary needs to be checked, not the full candidate set.
Approach 1: Brute Force — Sort
// O(n log n) time O(n) space (clone to preserve input)
public static int findKthLargestBrute(int[] nums, int k) {
int[] copy = nums.clone();
Arrays.sort(copy); // ascending
return copy[copy.length - k]; // kth from the right
}- Time: O(n log n) — full sort regardless of k
- Space: O(n) if a copy is needed; O(1) in-place if mutation is allowed
- Tradeoff: Dead-simple to read and implement; acceptable for one-shot offline queries. Falls apart in streaming/online settings where k << n and new elements arrive continuously — a full re-sort on each arrival is prohibitive.
Approach 2: Min-Heap of Size k
// O(n log k) time O(k) space
public static int findKthLargestOptimal(int[] nums, int k) {
PriorityQueue<Integer> minHeap = new PriorityQueue<>(k); // natural order = min-heap
for (int num : nums) {
minHeap.offer(num);
if (minHeap.size() > k) {
minHeap.poll(); // evict smallest — it can't be kth largest
}
}
return minHeap.peek(); // root = kth largest
}- Time: O(n log k) — each of n elements triggers at most one O(log k) offer + one O(log k) poll
- Space: O(k) — heap never exceeds k entries
- Key Insight: A min-heap of size k acts as a “window that retains only the k largest seen so far”. The root is always the smallest of the retained set, which is exactly the kth largest overall. Any new element smaller than the root cannot crack the top-k and is evicted immediately in O(log k). When k << n (e.g., top-10 from 10 million) the saving over sorting is dramatic — O(n log 10) ≈ O(n) vs O(n log n).
Problem 2: Find Median from Data Stream
Link: LeetCode 295
Quick Summary: Design a data structure that accepts a stream of integers one at a time and returns the median of all elements seen so far in O(1).
Industry Context: Real-time latency percentile tracking in observability platforms (Datadog, New Relic, Prometheus). P50 (median) latency must be computed over a rolling window of request durations without storing and re-sorting the entire window on each request. The two-heap approach is also how sliding-window median problems are solved in distributed stream processors like Apache Flink with minor extensions (a removal-from-heap step using lazy deletion).
Approach 1: Brute Force — Sorted Insertion
// addNum: O(n) time findMedian: O(1) time Space: O(n)
static class MedianFinderBrute {
private final List<Integer> sorted = new ArrayList<>();
public void addNum(int num) {
int pos = Collections.binarySearch(sorted, num);
if (pos < 0) pos = -(pos + 1);
sorted.add(pos, num); // O(n) shift
}
public double findMedian() {
int n = sorted.size();
if (n % 2 == 1) return sorted.get(n / 2);
return (sorted.get(n / 2 - 1) + sorted.get(n / 2)) / 2.0;
}
}- Time: O(n) per
addNum(binary search is O(log n) but ArrayList shift is O(n)); O(1)findMedian - Space: O(n)
- Tradeoff: Binary search finds the position fast, but inserting into an ArrayList still copies up to n elements. Fine for offline small datasets; unacceptable for high-throughput streams (millions of events/sec).
Approach 2: Two-Heap Partition
// addNum: O(log n) time findMedian: O(1) time Space: O(n)
static class MedianFinderOptimal {
// max-heap: lower half (Comparator.reverseOrder makes PQ a max-heap)
private final PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Comparator.reverseOrder());
// min-heap: upper half (natural ordering)
private final PriorityQueue<Integer> minHeap = new PriorityQueue<>();
public void addNum(int num) {
maxHeap.offer(num); // always route through maxHeap first
minHeap.offer(maxHeap.poll()); // ensure cross-heap ordering invariant
if (minHeap.size() > maxHeap.size()) { // re-balance: maxHeap leads or ties
maxHeap.offer(minHeap.poll());
}
}
public double findMedian() {
if (maxHeap.size() > minHeap.size()) return maxHeap.peek();
return (maxHeap.peek() + minHeap.peek()) / 2.0;
}
}- Time: O(log n) per
addNum— at most 3 heap operations each O(log n); O(1)findMedian - Space: O(n)
- Key Insight: Split elements into two halves at the median boundary. A max-heap gives O(1) access to the largest of the lower half; a min-heap gives O(1) access to the smallest of the upper half. The invariant
|maxHeap.size() - minHeap.size()| <= 1ensures the median is always one or two O(1) peeks away. Routing every insertion through maxHeap then pushing its top to minHeap is the cleanest way to maintain the cross-heap ordering property in all cases (including whennumbelongs in the upper half).
Problem 3: Merge K Sorted Lists
Link: LeetCode 23
Quick Summary: Given an array of k sorted linked lists, merge them into one sorted linked list and return its head.
Industry Context: Distributed log aggregation — Kafka consumer groups reading from k partitions (each internally ordered) must emit a globally time-ordered event stream to downstream processors. External merge sort in database engines (PostgreSQL, MySQL) merges k sorted runs from disk during the sort phase of a sort-merge join. Lucene segment merges work identically: k sorted postings lists are combined into a single sorted segment using a heap over segment heads.
Approach 1: Brute Force — Flatten + Sort
// O(N log N) time O(N) space where N = total nodes
public static ListNode mergeKListsBrute(ListNode[] lists) {
List<Integer> values = new ArrayList<>();
for (ListNode head : lists) {
while (head != null) { values.add(head.val); head = head.next; }
}
Collections.sort(values);
ListNode dummy = new ListNode(0), curr = dummy;
for (int v : values) { curr.next = new ListNode(v); curr = curr.next; }
return dummy.next;
}- Time: O(N log N) — collection is O(N), sort is O(N log N)
- Space: O(N) — all values copied into the list
- Tradeoff: Discards the pre-sorted structure of the k lists entirely. Correct and readable, but wastes the O(k log k) work that sorted order in each list represents. Not viable when lists are too large to fit in memory simultaneously (external merge scenario).
Approach 2: Min-Heap over List Heads
// O(N log k) time O(k) space
public static ListNode mergeKListsOptimal(ListNode[] lists) {
PriorityQueue<ListNode> minHeap =
new PriorityQueue<>(Comparator.comparingInt(node -> node.val));
for (ListNode head : lists) if (head != null) minHeap.offer(head);
ListNode dummy = new ListNode(0), curr = dummy;
while (!minHeap.isEmpty()) {
ListNode node = minHeap.poll(); // globally smallest current head
curr.next = node;
curr = curr.next;
if (node.next != null) minHeap.offer(node.next);
}
return dummy.next;
}- Time: O(N log k) — each of N nodes is inserted and extracted from a heap of size at most k; each operation costs O(log k)
- Space: O(k) — heap holds at most one node per list at any time
- Key Insight: At each step we only need the minimum among k current heads — not the minimum of all N remaining nodes. A heap of size k answers this in O(log k) instead of a linear O(k) scan, cutting the total from O(Nk) to O(N log k). This is the same reasoning behind heap-based external sort: you only keep k “cursors” (one per sorted run) in memory, not all N records.
Problem 4: Task Scheduler
Link: LeetCode 621
Quick Summary: Given a list of CPU tasks (A–Z) and a cooldown interval n, compute the minimum number of CPU intervals needed to execute all tasks, inserting idle slots when no task is available.
Industry Context: OS process schedulers with priority and rate-limiting: tasks of the same type (same PID group, same cgroup) are throttled to run at most once per n ticks. Kubernetes rate-limits retry loops for failing pods using an exponential backoff that can be modelled as a cooldown queue. Cloud provider quota enforcement (e.g., AWS API call limits per service per second) follows the same pattern — a max-heap picks the highest-priority pending request; a cooldown gate enforces the rate limit.
Approach 1: Brute Force — Cycle-by-Cycle Simulation
// O(n + totalTime) time O(1) space (bounded alphabet of 26)
public static int leastIntervalBrute(char[] tasks, int n) {
int[] freq = new int[26];
for (char t : tasks) freq[t - 'A']++;
int[] cooldown = new int[26]; // earliest tick each task can run
int time = 0, remaining = tasks.length;
while (remaining > 0) {
int best = -1;
for (int i = 0; i < 26; i++) {
if (freq[i] > 0 && cooldown[i] <= time)
if (best == -1 || freq[i] > freq[best]) best = i;
}
if (best == -1) { time++; } // idle
else { freq[best]--; cooldown[best] = time + n + 1; remaining--; time++; }
}
return time;
}- Time: O(n · maxFreq) in the worst case; bounded because there are at most 26 task types
- Space: O(1) — two 26-element arrays
- Tradeoff: Transparent simulation; easy to verify correctness. Slow when maxFreq is very large (think n=100, all same task type with freq=10⁶).
Approach 2: Greedy with Max-Heap + Cooldown Queue
// O(n log 26) = O(n) time O(1) space (heap + queue bounded to 26 entries)
public static int leastIntervalOptimal(char[] tasks, int n) {
int[] freq = new int[26];
for (char t : tasks) freq[t - 'A']++;
PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Comparator.reverseOrder());
for (int f : freq) if (f > 0) maxHeap.offer(f);
Queue<int[]> cooldownQueue = new LinkedList<>(); // [remainingCount, readyAtTime]
int time = 0;
while (!maxHeap.isEmpty() || !cooldownQueue.isEmpty()) {
time++;
if (!cooldownQueue.isEmpty() && cooldownQueue.peek()[1] <= time)
maxHeap.offer(cooldownQueue.poll()[0]);
if (!maxHeap.isEmpty()) {
int remaining = maxHeap.poll() - 1;
if (remaining > 0) cooldownQueue.offer(new int[]{remaining, time + n + 1});
}
// else: idle cycle — time increments, nothing happens
}
return time;
}- Time: O(n log 26) = O(n) — heap size is at most 26; total iterations = total CPU time, which is at most O(n + n·(n/(maxFreq))) = O(n) amortised for the common case
- Space: O(1) — bounded by 26 task types
- Key Insight: Greedy correctness: scheduling the most-frequent available task first never worsens the outcome — if we defer a high-frequency task, it will still create idle slots later. The cooldown queue enables O(1) re-admission without scanning all 26 types each tick. The heap and queue together give the same guarantees as the brute-force simulation but without per-tick O(26) scans.
Bonus — Mathematical Closed Form (useful to mention in interviews):
// O(n) time O(1) space
public static int leastIntervalMath(char[] tasks, int n) {
int[] freq = new int[26];
for (char t : tasks) freq[t - 'A']++;
Arrays.sort(freq);
int maxFreq = freq[25];
int countOfMaxFreq = 0;
for (int f : freq) if (f == maxFreq) countOfMaxFreq++;
return Math.max(tasks.length, (maxFreq - 1) * (n + 1) + countOfMaxFreq);
}The formula models the optimal schedule as (maxFreq - 1) complete frames of size (n+1), plus a final partial frame of width countOfMaxFreq. If other tasks fill all the gaps, no idle slots are needed and tasks.length is the floor.
Problem 5: IPO / Maximum Capital
Link: LeetCode 502
Quick Summary: Given k investment rounds, initial capital w, and arrays of project profits and required capitals, choose up to k projects to maximise final capital. Each project can be chosen at most once.
Industry Context: Cloud resource allocation — a platform team with a fixed budget selects infrastructure improvements greedily by ROI, where each improvement unlocks a larger budget for the next round. Investment portfolio construction under a capital constraint: at each rebalancing period, the fund picks the highest-yielding asset it can afford, reinvests the gain, and repeats. Feature-flag gated rollouts where each rollout “earns” engineering capacity (reduced incident load) that unlocks the next rollout.
Approach 1: Brute Force — Linear Scan Per Round
// O(n * k) time O(n) space
public static int findMaximisedCapitalBrute(int k, int w, int[] profits, int[] capital) {
int n = profits.length;
boolean[] used = new boolean[n];
for (int round = 0; round < k; round++) {
int best = -1;
for (int i = 0; i < n; i++)
if (!used[i] && capital[i] <= w)
if (best == -1 || profits[i] > profits[best]) best = i;
if (best == -1) break;
used[best] = true;
w += profits[best];
}
return w;
}- Time: O(n · k) — each of k rounds does an O(n) scan
- Space: O(n) for the
usedarray - Tradeoff: Correct and easy to reason about. Breaks when n = 10⁵ and k = 10⁵ (10¹⁰ operations). Also re-scans projects that are permanently unaffordable on every round.
Approach 2: Two-Heap Greedy
// O(n log n + k log n) time O(n) space
public static int findMaximisedCapitalOptimal(int k, int w, int[] profits, int[] capital) {
int n = profits.length;
// min-heap by required capital — gates which projects are affordable
PriorityQueue<int[]> minCapHeap =
new PriorityQueue<>(Comparator.comparingInt(p -> p[0]));
for (int i = 0; i < n; i++) minCapHeap.offer(new int[]{capital[i], profits[i]});
// max-heap by profit — picks the best affordable project each round
PriorityQueue<Integer> maxProfitHeap = new PriorityQueue<>(Comparator.reverseOrder());
for (int round = 0; round < k; round++) {
while (!minCapHeap.isEmpty() && minCapHeap.peek()[0] <= w)
maxProfitHeap.offer(minCapHeap.poll()[1]); // unlock newly affordable projects
if (maxProfitHeap.isEmpty()) break;
w += maxProfitHeap.poll(); // take best available project
}
return w;
}- Time: O(n log n) to build minCapHeap + O(k log n) for k rounds (each round does O(log n) heap ops); total O((n + k) log n)
- Space: O(n) — both heaps together hold n entries
- Key Insight: Capital only ever increases (profits ≥ 0), so a project that becomes affordable at round r stays affordable forever. This monotonicity means we only sweep minCapHeap forward — each of n projects transitions to maxProfitHeap at most once across all k rounds, giving O(n log n) total for all draining steps. Without this insight you might worry the inner
whileloop is O(n) per round; the amortised argument shows it’s O(n) total. The two-heap design separates the “can I afford this?” question (minCapHeap) from the “which affordable project is best?” question (maxProfitHeap) — clean separation that maps directly to a real investment decision engine.
Heap Pattern Decision Guide
Min-Heap vs Max-Heap vs Two-Heap
| Situation | Heap Type | Rationale |
|---|---|---|
| Find / maintain kth largest | Min-heap of size k | Root = kth largest; anything smaller is evicted immediately |
| Find / maintain kth smallest | Max-heap of size k | Root = kth smallest; anything larger is evicted |
| Running median / percentile | Two-heap (max-left + min-right) | O(1) median at the boundary between halves |
| Always need the overall maximum | Max-heap (PQ with reverseOrder) | O(1) peek at top |
| Always need the overall minimum | Min-heap (PQ natural order) | O(1) peek at top |
| k-way merge of sorted sources | Min-heap over k heads | O(log k) global min instead of O(k) linear scan |
| Greedy: best available option changes over time | Max-heap gated by a min-heap (IPO pattern) | Separate affordability gate from optimality selection |
When Heap Beats Sorting
| Scenario | Sorting | Heap | Why Heap Wins |
|---|---|---|---|
| Top-K from n elements, k << n | O(n log n) | O(n log k) | Only k elements kept in memory |
| Streaming / online (elements arrive one by one) | Must re-sort on each arrival: O(n log n) each | O(log n) per insertion | Heap maintains invariant incrementally |
| K-way merge of sorted streams | Flatten + sort: O(N log N) | O(N log k) | Exploits existing sorted order in each stream |
| Median with continuous updates | O(n) insertion into sorted array | O(log n) add, O(1) query | Two-heap incremental update |
| Dynamic top-K with deletions | O(k) rebuild on each delete | O(log n) with lazy deletion | Tombstone approach keeps heap valid |
Rule of thumb: Use sorting when you need the answer once over a static dataset. Use a heap when elements arrive dynamically, the dataset is too large to hold in memory, or you need repeated min/max queries with interleaved insertions.
Java PriorityQueue Gotchas
1. Natural ordering = min-heap
PriorityQueue<Integer> minPQ = new PriorityQueue<>(); // min-heap ✅
PriorityQueue<Integer> maxPQ = new PriorityQueue<>(Comparator.reverseOrder()); // max-heap ✅Java’s PriorityQueue uses natural ordering by default, so integers come out smallest-first. A common bug is assuming it’s a max-heap and using poll() expecting the largest element.
2. Custom Comparator for arrays / objects
// Sort int[] by first element ascending:
PriorityQueue<int[]> pq = new PriorityQueue<>(Comparator.comparingInt(a -> a[0]));
// Sort ListNode by val:
PriorityQueue<ListNode> pq = new PriorityQueue<>(Comparator.comparingInt(n -> n.val));Never rely on int[].compareTo — arrays don’t implement Comparable. Always provide an explicit comparator when the heap holds arrays or custom objects.
3. remove(Object) is O(n), not O(log n)
pq.remove(element); // O(n) linear scan — PriorityQueue has no O(log n) arbitrary removePriorityQueue.remove(Object) must scan the backing array linearly. For problems requiring arbitrary removal (e.g., sliding-window median), use lazy deletion: mark elements as deleted in a HashSet, then skip them on poll(). TreeMap or TreeSet (backed by a red-black tree) offers O(log n) arbitrary removal if needed.
4. poll() on an empty queue returns null
Integer val = pq.poll(); // returns null if empty — auto-unboxing will throw NullPointerException
int v = pq.poll(); // ⚠️ NPE if empty — always guard with isEmpty() check5. Initial capacity hint for performance
PriorityQueue<Integer> pq = new PriorityQueue<>(k); // avoids repeated array growthProviding the expected size avoids O(n) array-copy resizing during bulk insertions, important when k is large.
6. Iteration order is NOT sorted
for (int x : pq) { /* NOT guaranteed sorted */ } // heap-order, not sorted orderIterating a PriorityQueue visits elements in heap-internal order (not necessarily sorted). To drain in sorted order, repeatedly call poll().
Key Takeaways
Pattern Recognition Triggers
- “Kth largest / Kth smallest” → min-heap or max-heap of size k (O(n log k) beats sort when k << n)
- “Median” or “50th percentile” from a stream → two-heap (max-heap left half, min-heap right half)
- “Merge k sorted …” (lists, arrays, files) → min-heap over k heads (O(N log k))
- “Schedule with cooldown / rate limit” → max-heap + cooldown queue greedy
- “Select best available project / task given a resource constraint that grows” → two-heap greedy (IPO pattern)
- “Top-K with continuous insertions” → heap, NOT re-sort; heap preserves invariant incrementally
Heap Interview Tells (What Interviewers Are Listening For)
- Explain why a min-heap of size k gives the kth largest (the root is the smallest of the k largest = the answer)
- Articulate the O(n log k) vs O(n log n) saving and when it matters (streaming, k << n)
- Describe the two-heap invariant precisely: “every element in max-heap ≤ every element in min-heap, sizes differ by at most 1”
- Know that Java’s
PriorityQueueis a min-heap and useComparator.reverseOrder()for max-heap - Mention lazy deletion as the idiomatic workaround for O(n)
remove()in sliding-window median variants - For merge-K: connect to external sort / Kafka partition merge — demonstrates system-design awareness
System Design Connections
| Algorithm Pattern | System Design Application |
|---|---|
| Min-heap of size k (Top-K) | Real-time leaderboard service; “trending topics” stream processor (Twitter Firehose, Kafka Streams) |
| Two-heap median | P50/P95/P99 latency SLO dashboard; sliding-window histogram in Prometheus/Datadog |
| K-way merge heap | External merge sort in PostgreSQL / MySQL; Lucene segment merging; Kafka partition merge in consumer groups |
| Max-heap + cooldown queue (Task Scheduler) | OS CFS (Completely Fair Scheduler); Kubernetes rate-limited pod retry controller; AWS API throttle queue |
| Two-heap greedy (IPO) | Portfolio optimiser under capital constraint; cloud infra cost-ROI tool; capacity planning allocator |
| Heap-based event loop | Game engine event queue; network I/O event dispatcher (Java NIO Selector, epoll); timer wheel for TTL expiry |
The “Heap or Sort?” Interview Decision
When an interviewer asks about your approach:
- Static data, one query: sort is fine — O(n log n), simple, no overhead
- Online / streaming data: heap — O(log n) per insertion, O(1) peek, no full re-sort
- k << n: heap of size k — O(n log k) beats sort; heap also uses O(k) vs O(n) memory
- Need arbitrary removal: consider
TreeMap<Integer, Integer>(count map) — O(log n) add/remove/min/max, beats heap’s O(n) remove for the sliding-window median variant
External Merge Sort — How Heaps Power Distributed Systems
When a dataset is too large for RAM (e.g., a 10 TB sort in MapReduce), the engine writes sorted runs to disk, then merges them with a min-heap of size k (one entry per run). The heap guarantees only k records are in memory at once; each poll() writes the next globally-sorted record to output and each offer() advances that run’s cursor. This is identical to the Merge K Sorted Lists algorithm, just at petabyte scale. Being able to explain this bridge from LeetCode to distributed systems is the mark of a senior engineer.