Practice Session 7: Heap / Priority Queue

Overview

A heap (implemented in Java as PriorityQueue) provides O(log n) insert and O(log n) extract-min/max, with O(1) peek at the extreme element. This single guarantee unlocks three canonical interview patterns: top-K selection (min-heap of size k avoids sorting the entire array), stream median (two-heap partitioning gives O(1) median at any point), and k-way merge (heap over k heads eliminates the O(k) per-step linear scan). At senior level the goal is to articulate why the heap beats sorting in streaming/online contexts and to connect these patterns to production scheduler, analytics, and distributed-systems architectures.

Problem 1: Kth Largest Element in an Array

Link: LeetCode 215

Quick Summary: Given an integer array and an integer k, return the kth largest element — not the kth distinct element.

Industry Context: Real-time leaderboards (gaming, e-commerce “Top 10 sellers this hour”) where the dataset grows continuously and you cannot afford O(n log n) re-sort on every new entry. Recommendation engines maintaining a rolling top-K of candidate items scored by a model also apply this pattern — only the heap boundary needs to be checked, not the full candidate set.

Approach 1: Brute Force — Sort

// O(n log n) time  O(n) space (clone to preserve input)
public static int findKthLargestBrute(int[] nums, int k) {
    int[] copy = nums.clone();
    Arrays.sort(copy);              // ascending
    return copy[copy.length - k];   // kth from the right
}

Time: O(n log n) — full sort regardless of k
Space: O(n) if a copy is needed; O(1) in-place if mutation is allowed
Tradeoff: Dead-simple to read and implement; acceptable for one-shot offline queries. Falls apart in streaming/online settings where k << n and new elements arrive continuously — a full re-sort on each arrival is prohibitive.

Approach 2: Min-Heap of Size k

// O(n log k) time  O(k) space
public static int findKthLargestOptimal(int[] nums, int k) {
    PriorityQueue<Integer> minHeap = new PriorityQueue<>(k); // natural order = min-heap
    for (int num : nums) {
        minHeap.offer(num);
        if (minHeap.size() > k) {
            minHeap.poll(); // evict smallest — it can't be kth largest
        }
    }
    return minHeap.peek(); // root = kth largest
}

Time: O(n log k) — each of n elements triggers at most one O(log k) offer + one O(log k) poll
Space: O(k) — heap never exceeds k entries
Key Insight: A min-heap of size k acts as a “window that retains only the k largest seen so far”. The root is always the smallest of the retained set, which is exactly the kth largest overall. Any new element smaller than the root cannot crack the top-k and is evicted immediately in O(log k). When k << n (e.g., top-10 from 10 million) the saving over sorting is dramatic — O(n log 10) ≈ O(n) vs O(n log n).

Problem 2: Find Median from Data Stream

Link: LeetCode 295

Quick Summary: Design a data structure that accepts a stream of integers one at a time and returns the median of all elements seen so far in O(1).

Industry Context: Real-time latency percentile tracking in observability platforms (Datadog, New Relic, Prometheus). P50 (median) latency must be computed over a rolling window of request durations without storing and re-sorting the entire window on each request. The two-heap approach is also how sliding-window median problems are solved in distributed stream processors like Apache Flink with minor extensions (a removal-from-heap step using lazy deletion).

Approach 1: Brute Force — Sorted Insertion

// addNum: O(n) time  findMedian: O(1) time  Space: O(n)
static class MedianFinderBrute {
    private final List<Integer> sorted = new ArrayList<>();
 
    public void addNum(int num) {
        int pos = Collections.binarySearch(sorted, num);
        if (pos < 0) pos = -(pos + 1);
        sorted.add(pos, num);   // O(n) shift
    }
 
    public double findMedian() {
        int n = sorted.size();
        if (n % 2 == 1) return sorted.get(n / 2);
        return (sorted.get(n / 2 - 1) + sorted.get(n / 2)) / 2.0;
    }
}

Time: O(n) per addNum (binary search is O(log n) but ArrayList shift is O(n)); O(1) findMedian
Space: O(n)
Tradeoff: Binary search finds the position fast, but inserting into an ArrayList still copies up to n elements. Fine for offline small datasets; unacceptable for high-throughput streams (millions of events/sec).

Approach 2: Two-Heap Partition

// addNum: O(log n) time  findMedian: O(1) time  Space: O(n)
static class MedianFinderOptimal {
    // max-heap: lower half (Comparator.reverseOrder makes PQ a max-heap)
    private final PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Comparator.reverseOrder());
    // min-heap: upper half (natural ordering)
    private final PriorityQueue<Integer> minHeap = new PriorityQueue<>();
 
    public void addNum(int num) {
        maxHeap.offer(num);                        // always route through maxHeap first
        minHeap.offer(maxHeap.poll());             // ensure cross-heap ordering invariant
        if (minHeap.size() > maxHeap.size()) {     // re-balance: maxHeap leads or ties
            maxHeap.offer(minHeap.poll());
        }
    }
 
    public double findMedian() {
        if (maxHeap.size() > minHeap.size()) return maxHeap.peek();
        return (maxHeap.peek() + minHeap.peek()) / 2.0;
    }
}

Time: O(log n) per addNum — at most 3 heap operations each O(log n); O(1) findMedian
Space: O(n)
Key Insight: Split elements into two halves at the median boundary. A max-heap gives O(1) access to the largest of the lower half; a min-heap gives O(1) access to the smallest of the upper half. The invariant |maxHeap.size() - minHeap.size()| <= 1 ensures the median is always one or two O(1) peeks away. Routing every insertion through maxHeap then pushing its top to minHeap is the cleanest way to maintain the cross-heap ordering property in all cases (including when num belongs in the upper half).

Problem 3: Merge K Sorted Lists

Link: LeetCode 23

Quick Summary: Given an array of k sorted linked lists, merge them into one sorted linked list and return its head.

Industry Context: Distributed log aggregation — Kafka consumer groups reading from k partitions (each internally ordered) must emit a globally time-ordered event stream to downstream processors. External merge sort in database engines (PostgreSQL, MySQL) merges k sorted runs from disk during the sort phase of a sort-merge join. Lucene segment merges work identically: k sorted postings lists are combined into a single sorted segment using a heap over segment heads.

Approach 1: Brute Force — Flatten + Sort

// O(N log N) time  O(N) space  where N = total nodes
public static ListNode mergeKListsBrute(ListNode[] lists) {
    List<Integer> values = new ArrayList<>();
    for (ListNode head : lists) {
        while (head != null) { values.add(head.val); head = head.next; }
    }
    Collections.sort(values);
    ListNode dummy = new ListNode(0), curr = dummy;
    for (int v : values) { curr.next = new ListNode(v); curr = curr.next; }
    return dummy.next;
}

Time: O(N log N) — collection is O(N), sort is O(N log N)
Space: O(N) — all values copied into the list
Tradeoff: Discards the pre-sorted structure of the k lists entirely. Correct and readable, but wastes the O(k log k) work that sorted order in each list represents. Not viable when lists are too large to fit in memory simultaneously (external merge scenario).

Approach 2: Min-Heap over List Heads

// O(N log k) time  O(k) space
public static ListNode mergeKListsOptimal(ListNode[] lists) {
    PriorityQueue<ListNode> minHeap =
        new PriorityQueue<>(Comparator.comparingInt(node -> node.val));
    for (ListNode head : lists) if (head != null) minHeap.offer(head);
 
    ListNode dummy = new ListNode(0), curr = dummy;
    while (!minHeap.isEmpty()) {
        ListNode node = minHeap.poll();      // globally smallest current head
        curr.next = node;
        curr = curr.next;
        if (node.next != null) minHeap.offer(node.next);
    }
    return dummy.next;
}

Time: O(N log k) — each of N nodes is inserted and extracted from a heap of size at most k; each operation costs O(log k)
Space: O(k) — heap holds at most one node per list at any time
Key Insight: At each step we only need the minimum among k current heads — not the minimum of all N remaining nodes. A heap of size k answers this in O(log k) instead of a linear O(k) scan, cutting the total from O(Nk) to O(N log k). This is the same reasoning behind heap-based external sort: you only keep k “cursors” (one per sorted run) in memory, not all N records.

Problem 4: Task Scheduler

Link: LeetCode 621

Quick Summary: Given a list of CPU tasks (A–Z) and a cooldown interval n, compute the minimum number of CPU intervals needed to execute all tasks, inserting idle slots when no task is available.

Industry Context: OS process schedulers with priority and rate-limiting: tasks of the same type (same PID group, same cgroup) are throttled to run at most once per n ticks. Kubernetes rate-limits retry loops for failing pods using an exponential backoff that can be modelled as a cooldown queue. Cloud provider quota enforcement (e.g., AWS API call limits per service per second) follows the same pattern — a max-heap picks the highest-priority pending request; a cooldown gate enforces the rate limit.

Approach 1: Brute Force — Cycle-by-Cycle Simulation

// O(n + totalTime) time  O(1) space  (bounded alphabet of 26)
public static int leastIntervalBrute(char[] tasks, int n) {
    int[] freq = new int[26];
    for (char t : tasks) freq[t - 'A']++;
    int[] cooldown = new int[26]; // earliest tick each task can run
    int time = 0, remaining = tasks.length;
    while (remaining > 0) {
        int best = -1;
        for (int i = 0; i < 26; i++) {
            if (freq[i] > 0 && cooldown[i] <= time)
                if (best == -1 || freq[i] > freq[best]) best = i;
        }
        if (best == -1) { time++; }   // idle
        else { freq[best]--; cooldown[best] = time + n + 1; remaining--; time++; }
    }
    return time;
}

Time: O(n · maxFreq) in the worst case; bounded because there are at most 26 task types
Space: O(1) — two 26-element arrays
Tradeoff: Transparent simulation; easy to verify correctness. Slow when maxFreq is very large (think n=100, all same task type with freq=10⁶).

Approach 2: Greedy with Max-Heap + Cooldown Queue

// O(n log 26) = O(n) time  O(1) space  (heap + queue bounded to 26 entries)
public static int leastIntervalOptimal(char[] tasks, int n) {
    int[] freq = new int[26];
    for (char t : tasks) freq[t - 'A']++;
    PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Comparator.reverseOrder());
    for (int f : freq) if (f > 0) maxHeap.offer(f);
 
    Queue<int[]> cooldownQueue = new LinkedList<>(); // [remainingCount, readyAtTime]
    int time = 0;
    while (!maxHeap.isEmpty() || !cooldownQueue.isEmpty()) {
        time++;
        if (!cooldownQueue.isEmpty() && cooldownQueue.peek()[1] <= time)
            maxHeap.offer(cooldownQueue.poll()[0]);
        if (!maxHeap.isEmpty()) {
            int remaining = maxHeap.poll() - 1;
            if (remaining > 0) cooldownQueue.offer(new int[]{remaining, time + n + 1});
        }
        // else: idle cycle — time increments, nothing happens
    }
    return time;
}

Time: O(n log 26) = O(n) — heap size is at most 26; total iterations = total CPU time, which is at most O(n + n·(n/(maxFreq))) = O(n) amortised for the common case
Space: O(1) — bounded by 26 task types
Key Insight: Greedy correctness: scheduling the most-frequent available task first never worsens the outcome — if we defer a high-frequency task, it will still create idle slots later. The cooldown queue enables O(1) re-admission without scanning all 26 types each tick. The heap and queue together give the same guarantees as the brute-force simulation but without per-tick O(26) scans.

Bonus — Mathematical Closed Form (useful to mention in interviews):

// O(n) time  O(1) space
public static int leastIntervalMath(char[] tasks, int n) {
    int[] freq = new int[26];
    for (char t : tasks) freq[t - 'A']++;
    Arrays.sort(freq);
    int maxFreq = freq[25];
    int countOfMaxFreq = 0;
    for (int f : freq) if (f == maxFreq) countOfMaxFreq++;
    return Math.max(tasks.length, (maxFreq - 1) * (n + 1) + countOfMaxFreq);
}

The formula models the optimal schedule as (maxFreq - 1) complete frames of size (n+1), plus a final partial frame of width countOfMaxFreq. If other tasks fill all the gaps, no idle slots are needed and tasks.length is the floor.

Problem 5: IPO / Maximum Capital

Link: LeetCode 502

Quick Summary: Given k investment rounds, initial capital w, and arrays of project profits and required capitals, choose up to k projects to maximise final capital. Each project can be chosen at most once.

Industry Context: Cloud resource allocation — a platform team with a fixed budget selects infrastructure improvements greedily by ROI, where each improvement unlocks a larger budget for the next round. Investment portfolio construction under a capital constraint: at each rebalancing period, the fund picks the highest-yielding asset it can afford, reinvests the gain, and repeats. Feature-flag gated rollouts where each rollout “earns” engineering capacity (reduced incident load) that unlocks the next rollout.

Approach 1: Brute Force — Linear Scan Per Round

// O(n * k) time  O(n) space
public static int findMaximisedCapitalBrute(int k, int w, int[] profits, int[] capital) {
    int n = profits.length;
    boolean[] used = new boolean[n];
    for (int round = 0; round < k; round++) {
        int best = -1;
        for (int i = 0; i < n; i++)
            if (!used[i] && capital[i] <= w)
                if (best == -1 || profits[i] > profits[best]) best = i;
        if (best == -1) break;
        used[best] = true;
        w += profits[best];
    }
    return w;
}

Time: O(n · k) — each of k rounds does an O(n) scan
Space: O(n) for the used array
Tradeoff: Correct and easy to reason about. Breaks when n = 10⁵ and k = 10⁵ (10¹⁰ operations). Also re-scans projects that are permanently unaffordable on every round.

Approach 2: Two-Heap Greedy

// O(n log n + k log n) time  O(n) space
public static int findMaximisedCapitalOptimal(int k, int w, int[] profits, int[] capital) {
    int n = profits.length;
    // min-heap by required capital — gates which projects are affordable
    PriorityQueue<int[]> minCapHeap =
        new PriorityQueue<>(Comparator.comparingInt(p -> p[0]));
    for (int i = 0; i < n; i++) minCapHeap.offer(new int[]{capital[i], profits[i]});
 
    // max-heap by profit — picks the best affordable project each round
    PriorityQueue<Integer> maxProfitHeap = new PriorityQueue<>(Comparator.reverseOrder());
 
    for (int round = 0; round < k; round++) {
        while (!minCapHeap.isEmpty() && minCapHeap.peek()[0] <= w)
            maxProfitHeap.offer(minCapHeap.poll()[1]);  // unlock newly affordable projects
        if (maxProfitHeap.isEmpty()) break;
        w += maxProfitHeap.poll();                       // take best available project
    }
    return w;
}

Time: O(n log n) to build minCapHeap + O(k log n) for k rounds (each round does O(log n) heap ops); total O((n + k) log n)
Space: O(n) — both heaps together hold n entries
Key Insight: Capital only ever increases (profits ≥ 0), so a project that becomes affordable at round r stays affordable forever. This monotonicity means we only sweep minCapHeap forward — each of n projects transitions to maxProfitHeap at most once across all k rounds, giving O(n log n) total for all draining steps. Without this insight you might worry the inner while loop is O(n) per round; the amortised argument shows it’s O(n) total. The two-heap design separates the “can I afford this?” question (minCapHeap) from the “which affordable project is best?” question (maxProfitHeap) — clean separation that maps directly to a real investment decision engine.

Heap Pattern Decision Guide

Min-Heap vs Max-Heap vs Two-Heap

Situation	Heap Type	Rationale
Find / maintain kth largest	Min-heap of size k	Root = kth largest; anything smaller is evicted immediately
Find / maintain kth smallest	Max-heap of size k	Root = kth smallest; anything larger is evicted
Running median / percentile	Two-heap (max-left + min-right)	O(1) median at the boundary between halves
Always need the overall maximum	Max-heap (PQ with reverseOrder)	O(1) peek at top
Always need the overall minimum	Min-heap (PQ natural order)	O(1) peek at top
k-way merge of sorted sources	Min-heap over k heads	O(log k) global min instead of O(k) linear scan
Greedy: best available option changes over time	Max-heap gated by a min-heap (IPO pattern)	Separate affordability gate from optimality selection

When Heap Beats Sorting

Scenario	Sorting	Heap	Why Heap Wins
Top-K from n elements, k << n	O(n log n)	O(n log k)	Only k elements kept in memory
Streaming / online (elements arrive one by one)	Must re-sort on each arrival: O(n log n) each	O(log n) per insertion	Heap maintains invariant incrementally
K-way merge of sorted streams	Flatten + sort: O(N log N)	O(N log k)	Exploits existing sorted order in each stream
Median with continuous updates	O(n) insertion into sorted array	O(log n) add, O(1) query	Two-heap incremental update
Dynamic top-K with deletions	O(k) rebuild on each delete	O(log n) with lazy deletion	Tombstone approach keeps heap valid

Rule of thumb: Use sorting when you need the answer once over a static dataset. Use a heap when elements arrive dynamically, the dataset is too large to hold in memory, or you need repeated min/max queries with interleaved insertions.

Java PriorityQueue Gotchas

1. Natural ordering = min-heap

PriorityQueue<Integer> minPQ = new PriorityQueue<>();           // min-heap ✅
PriorityQueue<Integer> maxPQ = new PriorityQueue<>(Comparator.reverseOrder()); // max-heap ✅

Java’s PriorityQueue uses natural ordering by default, so integers come out smallest-first. A common bug is assuming it’s a max-heap and using poll() expecting the largest element.

2. Custom Comparator for arrays / objects

// Sort int[] by first element ascending:
PriorityQueue<int[]> pq = new PriorityQueue<>(Comparator.comparingInt(a -> a[0]));
// Sort ListNode by val:
PriorityQueue<ListNode> pq = new PriorityQueue<>(Comparator.comparingInt(n -> n.val));

Never rely on int[].compareTo — arrays don’t implement Comparable. Always provide an explicit comparator when the heap holds arrays or custom objects.

3. remove(Object) is O(n), not O(log n)

pq.remove(element); // O(n) linear scan — PriorityQueue has no O(log n) arbitrary remove

PriorityQueue.remove(Object) must scan the backing array linearly. For problems requiring arbitrary removal (e.g., sliding-window median), use lazy deletion: mark elements as deleted in a HashSet, then skip them on poll(). TreeMap or TreeSet (backed by a red-black tree) offers O(log n) arbitrary removal if needed.

4. poll() on an empty queue returns null

Integer val = pq.poll(); // returns null if empty — auto-unboxing will throw NullPointerException
int v = pq.poll();       // ⚠️ NPE if empty — always guard with isEmpty() check

5. Initial capacity hint for performance

PriorityQueue<Integer> pq = new PriorityQueue<>(k); // avoids repeated array growth

Providing the expected size avoids O(n) array-copy resizing during bulk insertions, important when k is large.

6. Iteration order is NOT sorted

for (int x : pq) { /* NOT guaranteed sorted */ }  // heap-order, not sorted order

Iterating a PriorityQueue visits elements in heap-internal order (not necessarily sorted). To drain in sorted order, repeatedly call poll().

Key Takeaways

Pattern Recognition Triggers

“Kth largest / Kth smallest” → min-heap or max-heap of size k (O(n log k) beats sort when k << n)
“Median” or “50th percentile” from a stream → two-heap (max-heap left half, min-heap right half)
“Merge k sorted …” (lists, arrays, files) → min-heap over k heads (O(N log k))
“Schedule with cooldown / rate limit” → max-heap + cooldown queue greedy
“Select best available project / task given a resource constraint that grows” → two-heap greedy (IPO pattern)
“Top-K with continuous insertions” → heap, NOT re-sort; heap preserves invariant incrementally

Heap Interview Tells (What Interviewers Are Listening For)

Explain why a min-heap of size k gives the kth largest (the root is the smallest of the k largest = the answer)
Articulate the O(n log k) vs O(n log n) saving and when it matters (streaming, k << n)
Describe the two-heap invariant precisely: “every element in max-heap ≤ every element in min-heap, sizes differ by at most 1”
Know that Java’s PriorityQueue is a min-heap and use Comparator.reverseOrder() for max-heap
Mention lazy deletion as the idiomatic workaround for O(n) remove() in sliding-window median variants
For merge-K: connect to external sort / Kafka partition merge — demonstrates system-design awareness

System Design Connections

Algorithm Pattern	System Design Application
Min-heap of size k (Top-K)	Real-time leaderboard service; “trending topics” stream processor (Twitter Firehose, Kafka Streams)
Two-heap median	P50/P95/P99 latency SLO dashboard; sliding-window histogram in Prometheus/Datadog
K-way merge heap	External merge sort in PostgreSQL / MySQL; Lucene segment merging; Kafka partition merge in consumer groups
Max-heap + cooldown queue (Task Scheduler)	OS CFS (Completely Fair Scheduler); Kubernetes rate-limited pod retry controller; AWS API throttle queue
Two-heap greedy (IPO)	Portfolio optimiser under capital constraint; cloud infra cost-ROI tool; capacity planning allocator
Heap-based event loop	Game engine event queue; network I/O event dispatcher (Java NIO Selector, epoll); timer wheel for TTL expiry

The “Heap or Sort?” Interview Decision

When an interviewer asks about your approach:

Static data, one query: sort is fine — O(n log n), simple, no overhead
Online / streaming data: heap — O(log n) per insertion, O(1) peek, no full re-sort
k << n: heap of size k — O(n log k) beats sort; heap also uses O(k) vs O(n) memory
Need arbitrary removal: consider TreeMap<Integer, Integer> (count map) — O(log n) add/remove/min/max, beats heap’s O(n) remove for the sliding-window median variant

External Merge Sort — How Heaps Power Distributed Systems

When a dataset is too large for RAM (e.g., a 10 TB sort in MapReduce), the engine writes sorted runs to disk, then merges them with a min-heap of size k (one entry per run). The heap guarantees only k records are in memory at once; each poll() writes the next globally-sorted record to output and each offer() advances that run’s cursor. This is identical to the Merge K Sorted Lists algorithm, just at petabyte scale. Being able to explain this bridge from LeetCode to distributed systems is the mark of a senior engineer.

Study Notes by Niladri & AI

Explorer

Session-07-Heap