Minimum Genetic Mutations

Minimum Genetic Mutations

December 28, 2023
medium
dynamic programming

Problem #

Given two DNA strings, strand1 and strand2, return the minimum number of genetic mutations needed to change strand1 into strand2. A genetic mutation is defined as changing one character (i.e., ‘A’, ‘T’, ‘C’, or ‘G’) into another character.

Solution #

If the lengths of strand1 and strand2 can be different, the problem becomes a bit more complex. We need to consider not only mutations (changing one character into another) but also insertions and deletions to transform one strand into the other. This problem is akin to finding the minimum edit distance between two strings, where the allowed operations are insertions, deletions, and substitutions.

def min_edit_distance(strand1, strand2):
    """
    Calculate the minimum number of operations (mutations, insertions, deletions)
    needed to change strand1 into strand2.

    :param strand1: The original DNA strand
    :param strand2: The target DNA strand
    :return: The minimum number of operations needed
    """
    m, n = len(strand1), len(strand2)
    # Create a matrix to store the distances
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    # Initialize the first row and column
    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j

    # Fill in the rest of the matrix
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if strand1[i - 1] == strand2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = 1 + min(dp[i - 1][j],    # Deletion
                                   dp[i][j - 1],    # Insertion
                                   dp[i - 1][j - 1]) # Substitution

    return dp[m][n]

# Example usage
strand1 = "AACCGGTT"
strand2 = "AACCGGTAAC"
min_edit_distance(strand1, strand2)

This function calculates the minimum number of insertions, deletions, and substitutions required to transform strand1 into strand2. The dynamic programming approach ensures that the solution is efficient even for longer strands.

Solution Analysis #

The provided Python code implements a dynamic programming solution to find the minimum number of operations (mutations, insertions, and deletions) required to transform one DNA strand (strand1) into another (strand2). Let’s break down the code:

  1. Function Definition:

    • def min_edit_distance(strand1, strand2):
    • This function min_edit_distance takes two arguments, strand1 and strand2, which are the DNA strands to be compared.
  2. Matrix Initialization:

    • m, n = len(strand1), len(strand2)
    • dp = [[0] * (n + 1) for _ in range(m + 1)]
    • We first determine the lengths of the two strands (m and n).
    • We then initialize a 2D list (matrix) dp with dimensions (m + 1) x (n + 1). This matrix will store the minimum number of operations required to transform substrings of strand1 into substrings of strand2.
  3. Matrix Base Cases:

    • for i in range(m + 1): dp[i][0] = i and for j in range(n + 1): dp[0][j] = j
    • The first row (dp[0][j]) and the first column (dp[i][0]) are filled with indices.
    • The first row represents transforming an empty string into the first j characters of strand2, which requires j insertions.
    • The first column represents transforming the first i characters of strand1 into an empty string, which requires i deletions.
  4. Dynamic Programming Loop:

    • for i in range(1, m + 1): and for j in range(1, n + 1):
    • We iterate through each cell of the matrix, filling in values based on the following logic:
      • If the characters at the current positions in strand1 and strand2 are the same (strand1[i - 1] == strand2[j - 1]), then no operation is needed, and we carry over the value from the top-left diagonal (dp[i - 1][j - 1]).
      • If the characters are different, we take the minimum of three possible operations:
        • Deletion: Remove a character from strand1 (dp[i - 1][j] + 1).
        • Insertion: Add a character to strand1 to match strand2 (dp[i][j - 1] + 1).
        • Substitution: Change a character in strand1 to match strand2 (dp[i - 1][j - 1] + 1).
  5. Return Result:

    • return dp[m][n]
    • After filling the matrix, the bottom-right cell (dp[m][n]) contains the minimum number of operations required to transform strand1 into strand2. This value is returned as the result.
  6. Example Usage:

    • Here, the function is called with two example strands, and the minimum number of operations needed for transformation is computed.

This approach, known as dynamic programming, is efficient and effective for solving problems that involve finding the minimum cost/path for a particular objective, in this case, transforming one DNA strand into another.