Random element from the stream with uniform probability

Random element from the stream with uniform probability

February 6, 2024
easy
Reservoir Sampling

Problem #

Given a stream of elements too large to store in memory, pick a random element from the stream with uniform probability.

Solution Approach #

The task described is a classic problem in computer science, often solved using a technique known as Reservoir Sampling. The specific case of picking a single random element from a stream can be considered as Reservoir Sampling with a reservoir size of 1. Here’s how to approach the problem:

The goal is to select an element from a stream of unknown size with equal probability, without being able to store all elements in memory. To achieve this, we can use an algorithm that iterates through the stream and decides whether to replace the currently selected element with the current stream element based on a certain probability.

Algorithm Steps #

  1. Initialize a variable to store the selected element. Let’s call this selected_element.
  2. Initialize a counter variable, i, to 1. This will keep track of the number of elements seen so far in the stream.
  3. For each element x in the stream, do the following:
    • Generate a random number r between 1 and i (inclusive).
    • If r is 1, set selected_element to x.
    • Increment i by 1.
  4. After processing all elements, selected_element holds the randomly selected element from the stream.

Python Code #

import random

def select_random(stream):
    selected_element = None
    i = 1  # Counter for the number of elements seen so far
    
    for x in stream:
        if random.randint(1, i) == 1:  # With a 1/i chance, select the current element
            selected_element = x
        i += 1
    
    return selected_element

Time Complexity #

The time complexity of this algorithm is O(N), where N is the number of elements in the stream. This is because the algorithm needs to iterate through each element of the stream exactly once.

Space Complexity #

The space complexity is O(1). This is because the algorithm only needs a constant amount of space to store the selected element and the counter, regardless of the size of the input stream.

Explanation #

The algorithm ensures a uniform probability of selecting any element from the stream by ensuring that each element has a (1/n) probability of being chosen, where (n) is the number of elements seen so far. Initially, when the first element arrives, it is selected with probability 1 (since it’s the only element). When the second element arrives, there’s a (1/2) chance that it replaces the first element, and so on. By the time the last element arrives, there’s a (1/N) chance it is selected, ensuring each element has an equal chance of being chosen.