Find All Duplicates in an Array

Find All Duplicates in an Array

January 6, 2024
medium
hashing

Problem #

Given an array of integers, where each element appears either once or twice, find all elements that appear twice in the array. The solution should achieve O(n) time complexity and O(1) extra space.

Example:

  1. Input: [4,3,2,7,8,2,3,1] Output: [2,3] Explanation: The numbers 2 and 3 appear twice in the array.

  2. Input: [1,1,2] Output: [1] Explanation: The number 1 appears twice.

Solution Approach #

The solution involves using the array itself for hashing by marking visited numbers. Since the elements are integers and the values range from 1 to n (where n is the size of the array), we can use the index as a hash to record the frequency of each number by negating the value at the indexed position. If a number appears twice, its index position will be visited twice, and thus, the second visit will find the value at the index already negated, indicating a duplicate.

Algorithm Steps #

  1. Iterate through the array.
  2. For each element, use its absolute value as an index (subtract 1 to adjust for 0-based indexing).
  3. If the value at this index is positive, negate it.
  4. If the value is already negative, this is the second time we are encountering it, so add its absolute value to the result list.
  5. Return the list of duplicates.

Code (Python) #

def findDuplicates(nums):
    duplicates = []
    for num in nums:
        index = abs(num) - 1
        if nums[index] < 0:
            duplicates.append(abs(num))
        nums[index] = -nums[index]
    return duplicates

# Test the function
print(findDuplicates([4,3,2,7,8,2,3,1]))  # Output: [2,3]
print(findDuplicates([1,1,2]))            # Output: [1]

Time Complexity #

The time complexity is O(n), as we iterate through the array once.

Space Complexity #

The space complexity is O(1), as we use the input array itself for hashing and do not use any extra space for computation.

Solution Comparison #

Certainly! Let’s compare the provided solution (which uses array negation) with a solution that uses a map (or dictionary) to find duplicates in an array.

Provided Solution (Array Negation) #

  • Approach: This solution leverages the fact that the array elements are within the range from 1 to n, where n is the size of the array. It uses the array itself for hashing by negating the value at the index corresponding to each element. If a value is found already negated, it indicates a duplicate.
  • Time Complexity: O(n), as it requires a single pass through the array.
  • Space Complexity: O(1), since no additional storage is used; the input array itself is modified for tracking.
  • Drawbacks: This method alters the original array, which might not be desirable in all scenarios. Also, it only works when the constraints of the problem match (e.g., elements are positive and within a specific range).

Map Solution #

  • Approach: This solution involves using a map (or dictionary) to keep track of the frequency of each element. As we iterate through the array, we increment the count for each element in the map. If the count reaches 2, we add the element to the list of duplicates.
  • Time Complexity: O(n), as this solution also requires a single pass through the array.
  • Space Complexity: O(n), in the worst case, where n is the number of unique elements in the array. This is because we may need to store each element and its count in the map.
  • Drawbacks: It uses extra space for the map, which is not optimal if the input array is very large. However, it doesn’t modify the original array and works without constraints on the values of the elements.

Conclusion #

  • The array negation method is more space-efficient, using O(1) extra space, but at the cost of modifying the original array and having constraints on the element values.
  • The map solution is more versatile, working with any range of values and keeping the original array intact, but it requires O(n) extra space.

Both solutions have their trade-offs, and the choice between them depends on the specific requirements and constraints of the problem at hand.