Stream Median Finder

Stream Median Finder

February 7, 2024

Problem Statement #

Design a class to calculate the median of a number stream at any time. The class should support two operations: addNum(int num) which adds an integer to the data structure, and findMedian() which returns the median of all elements so far.

Solution Approach #

The solution involves using two heaps: a max heap for the lower half of the numbers and a min heap for the upper half. This way, the median can be efficiently found or updated as new numbers are added.

Algorithm Steps #

  1. Use a max heap to store the smaller half of the numbers and a min heap for the larger half.
  2. For each new number, compare it with the top of the heaps to decide where to add it.
  3. Balance the heaps: if their sizes differ by more than one, transfer a number from one heap to the other.
  4. To find the median, check the sizes of both heaps:
    • If they are equal, the median is the average of the tops of both heaps.
    • If they are not, the median is the top of the larger heap.

Code (Python) #

from heapq import *

class MedianFinder:
    def __init__(self):
        # Min heap for the larger half
        self.minHeap = []
        # Max heap for the smaller half
        self.maxHeap = []

    def addNum(self, num):
        # Ensure maxHeap always has the smaller half of numbers
        if not self.maxHeap or -self.maxHeap[0] >= num:
            heappush(self.maxHeap, -num)
            heappush(self.minHeap, num)

        # Balance the heaps
        if len(self.maxHeap) > len(self.minHeap) + 1:
            heappush(self.minHeap, -heappop(self.maxHeap))
        elif len(self.minHeap) > len(self.maxHeap):
            heappush(self.maxHeap, -heappop(self.minHeap))

    def findMedian(self):
        if len(self.maxHeap) > len(self.minHeap):
            return -self.maxHeap[0]
        return (-self.maxHeap[0] + self.minHeap[0]) / 2

# Example usage
finder = MedianFinder()
print(finder.findMedian())  # Output: 1.5
print(finder.findMedian())  # Output: 2

Time Complexity #

  • addNum(): O(log n) due to the heap operations.
  • findMedian(): O(1) as it only requires accessing the top of the heaps.

Space Complexity #

  • O(n), where n is the number of elements added.