Top K Frequent Elements

Top K Frequent Elements

December 20, 2023
medium
heap

Problem #

Given an unsorted list of integers, find the k most frequent elements in the list. Return them as a list with each element’s frequency.

Solution #

To solve the problem of finding the k most frequent elements in an unsorted list of integers, you can use a combination of a hash map (dictionary in Python) to store the frequency of each element and a heap to efficiently find the k most frequent elements. Here’s how you can do it in Python:

  1. Use a dictionary to count the frequency of each element in the list.
  2. Use a heap (priority queue) to keep track of the k most frequent elements.
  3. Convert the heap to a list and return it.

Here’s the implementation:

import heapq
from collections import Counter

def k_most_frequent(nums, k):
    # Count the frequency of each element
    freq = Counter(nums)

    # Use a min-heap to find the k most frequent elements
    # The heap will store pairs of (-frequency, element) to sort by frequency
    min_heap = []
    for num, count in freq.items():
        heapq.heappush(min_heap, (-count, num))
        if len(min_heap) > k:
            heapq.heappop(min_heap)

    # Extract the elements and their frequencies from the heap
    result = [(-freq, num) for freq, num in min_heap]

    return result

# Example usage
nums = [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4]
k = 2
print(k_most_frequent(nums, k))  # Output will be the 2 most frequent elements along with their frequencies

In this code:

  • The Counter class from the collections module is used to count the frequency of each element in the list.
  • A min-heap is used to keep track of the k elements with the highest frequency. The heap stores tuples of the form (-frequency, element) so that the element with the lowest frequency (highest negative frequency) is always at the top of the heap.
  • The heapq module is used to efficiently manage the heap.
  • After pushing all elements to the heap and ensuring its size doesn’t exceed k, the top k elements are retrieved. Their frequencies are negated back to positive values for the final result.