Sum of Distances in Tree

Sum of Distances in Tree

January 23, 2024

Problem Statement #

Given an undirected, connected tree with n nodes labeled from 0 to n - 1, and an array edges where edges[i] = [ai, bi] indicates that there is an edge between nodes ai and bi, return an array answer of length n where answer[i] is the sum of the distances between the i-th node and all other nodes.


  1. Input: n = 6, edges = [[0,1], [0,2], [2,3], [2,4], [2,5]] Output: [8, 12, 6, 10, 10, 10] Explanation: The tree is shown as follows: 0 /
    1 2 /|
    3 4 5 The sum of distances for each node: [8, 12, 6, 10, 10, 10]

Solution Approach #

The solution involves using Depth-First Search (DFS) to compute the subtree sizes and distances for each node, followed by another DFS to calculate the answer array.

Algorithm Steps #

  1. Perform DFS to compute the size of each subtree and the distance sum for each node.
  2. Perform another DFS to calculate the answer array using previously computed values.
  3. Return the answer array.

Code (Python) #

def sumOfDistancesInTree(n, edges):
    tree = [[] for _ in range(n)]
    count = [1] * n
    res = [0] * n

    for i, j in edges:

    def dfs(node, parent):
        for child in tree[node]:
            if child != parent:
                dfs(child, node)
                count[node] += count[child]
                res[node] += res[child] + count[child]

    def dfs2(node, parent):
        for child in tree[node]:
            if child != parent:
                res[child] = res[node] - count[child] + n - count[child]
                dfs2(child, node)

    dfs(0, -1)
    dfs2(0, -1)
    return res

# Test the function
print(sumOfDistancesInTree(6, [[0,1], [0,2], [2,3], [2,4], [2,5]]))  # Output: [8, 12, 6, 10, 10, 10]

Solution Anaysis #

Using Depth-First Search (DFS) twice in the “Sum of Distances in Tree” problem is a strategic approach to solve the problem efficiently. Let’s understand why each DFS is used and how they contribute to the solution:

First DFS: Calculate Subtree Sizes and Distance Sums #

  1. Purpose: The first DFS traversal is used to compute two things for each node:

    • The size of its subtree (including itself).
    • The sum of distances from the node to all nodes in its subtree.
  2. How It Works: Starting from an arbitrary root (usually node 0), DFS is performed. For each node, you visit all its children and accumulate:

    • count[node]: The size of the subtree rooted at node.
    • res[node]: The sum of distances from node to all nodes in its subtree.
  3. Accumulation Strategy: For each child child of a node node, the distance from node to every node in the subtree rooted at child is one more than the distance from child to every node in its subtree. Therefore, res[node] accumulates res[child] (distance sums within the subtree) plus count[child] (each node in the child’s subtree is one step further from node).

Second DFS: Calculate the Final Answer for Each Node #

  1. Purpose: The second DFS traversal calculates the final answer for each node (i.e., the sum of distances from each node to all other nodes).

  2. How It Works: This traversal reuses the information gathered in the first DFS. It calculates the distance sum for each node based on its parent’s distance sum.

  3. Propagation Strategy: When moving from a node to its child, the child’s distance sum can be derived from the parent’s distance sum. Since moving to a child means getting one step closer to each node in the child’s subtree and one step further from each node not in the child’s subtree, we adjust the parent’s sum accordingly.

Why This Approach is Efficient #

  • Avoids Redundancy: Calculating subtree sizes and internal distance sums first avoids recalculating these values for each node, which would be highly inefficient.
  • Leverages Computed Information: The second DFS efficiently computes the final answer using the subtree information from the first DFS, ensuring each node’s distances are calculated only once.

In summary, the two DFS traversals complement each other - the first sets up necessary information about the tree structure, and the second utilizes this information to efficiently compute the final distances for each node.

Time Complexity #

O(n), where n is the number of nodes in the tree. The tree is traversed twice.

Space Complexity #

O(n), for storing the tree structure and intermediate calculations.