The Longest Common Subsequence (LCS) problem is a classic computer science problem that involves finding the longest subsequence common to two sequences. Unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences.
Problem Definition
Given two sequences, and , the goal is to find the longest subsequence that is present in both sequences.
For example, consider the DNA sequences:
- =
TGACTA
- =
GTGCATG
A common subsequence is GTA
, but it’s not the longest common subsequence. Longer common subsequences of length 4 include TGCA
, TGAT
, and TGCT
.
To solve the Longest Common Subsequence (LCS) problem, we often use a Dynamic Programming (DP) approach to compute the length of the LCS efficiently. However, before diving into the DP solution, it’s beneficial to understand the problem using a recursive approach with backtracking. This will help you appreciate how dynamic programming optimizes the recursive solution by eliminating redundant calculations.
Let’s break down the problem into two parts:
- Recursive Backtracking Solution: Understand how recursion can be used to solve the LCS problem.
- Dynamic Programming Approach: Implement the optimized DP solution after identifying overlapping subproblems.
- Recursion State Tree: Visualize the recursive state tree to understand the process better.
1. Recursive Backtracking Solution
The recursive solution for finding the LCS involves comparing characters from both sequences and exploring all possible subsequences. The idea is as follows:
- If the last characters of both sequences match, then the LCS includes this character, and we recursively find the LCS of the remaining sequences.
- If the last characters do not match, we have two options:
- Exclude the last character of the first sequence and find the LCS.
- Exclude the last character of the second sequence and find the LCS.
- The LCS length will be the maximum of the two options.
Recursive Algorithm in C#
using System;
public class LCSRecursive
{
// Function to find LCS using recursion
public static int LCS(string x, string y, int i, int j)
{
// Base case: if either string is empty
if (i == 0 || j == 0)
return 0;
// If last characters match
if (x[i - 1] == y[j - 1])
return 1 + LCS(x, y, i - 1, j - 1);
// If last characters do not match
else
return Math.Max(LCS(x, y, i - 1, j), LCS(x, y, i, j - 1));
}
public static void Main()
{
string x = "TGACTA";
string y = "GTGCATG";
int n = x.Length;
int m = y.Length;
int lcsLength = LCS(x, y, n, m);
Console.WriteLine("Length of LCS (Recursive): " + lcsLength);
}
}
2. Dynamic Programming Approach
The recursive approach has an exponential time complexity (O(2^n)
) due to overlapping subproblems, making it inefficient for large inputs. Dynamic Programming (DP) optimizes this by storing results of subproblems in a table, so we compute each subproblem only once.
Dynamic Programming Algorithm in C#
using System;
public class LCSDynamic
{
public static int LongestCommonSubsequence(string x, string y)
{
int n = x.Length;
int m = y.Length;
int[,] dp = new int[n + 1, m + 1];
// Build the dp table
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= m; j++)
{
if (x[i - 1] == y[j - 1])
{
dp[i, j] = dp[i - 1, j - 1] + 1;
}
else
{
dp[i, j] = Math.Max(dp[i - 1, j], dp[i, j - 1]);
}
}
}
// Optional: To print the LCS
PrintLCS(x, y, dp, n, m);
return dp[n, m]; // Length of LCS
}
// Function to print the LCS
public static void PrintLCS(string x, string y, int[,] dp, int n, int m)
{
int i = n, j = m;
string lcs = "";
while (i > 0 && j > 0)
{
if (x[i - 1] == y[j - 1])
{
lcs = x[i - 1] + lcs;
i--;
j--;
}
else if (dp[i - 1, j] > dp[i, j - 1])
{
i--;
}
else
{
j--;
}
}
Console.WriteLine("Longest Common Subsequence: " + lcs);
}
public static void Main()
{
string x = "TGACTA";
string y = "GTGCATG";
int length = LongestCommonSubsequence(x, y);
Console.WriteLine("Length of LCS (DP): " + length);
}
}
Recursion State Tree Visualization
To understand how the recursive solution works, let’s visualize the recursion state tree for smaller input sequences: and .
Explanation of the Recursion State Tree
- Root Node: The root node represents the initial LCS call
LCS('TGAC', 'GCAT')
. - Branching: Each node branches into two children, representing the two cases: excluding a character from either
~x
or~y
. - Subproblems: As you go down each level of the tree, the problem size reduces until reaching a base case where either
~x
or~y
is empty. - Redundant Calculations: The recursion tree shows overlapping subproblems, such as multiple calls to
LCS('TG', 'GC')
. Dynamic programming stores these results to avoid recomputation.
Conclusion
The recursive solution gives us a conceptual understanding of the LCS problem but is inefficient due to redundant calculations. The dynamic programming approach optimizes this by storing intermediate results in a table, reducing the time complexity to .
By visualizing the recursion tree, you can see how dynamic programming eliminates overlapping subproblems and makes the solution feasible for larger inputs.