#P2268. Optimal DNA Sequence Alignment
Optimal DNA Sequence Alignment
Optimal DNA Sequence Alignment
DNA molecules carry the genetic information of living organisms. A DNA sequence is typically represented by a string over the alphabet \(\{A, T, C, G\}\), where each letter stands for one nucleoside: \(A\) for adenine, \(T\) for thymine, \(C\) for cytosine, and \(G\) for guanine. In the course of evolution, various mutations such as insertions, deletions, and substitutions can occur in a DNA sequence.
In an alignment of two DNA sequences, a gap (denoted by a '-' character) may be inserted so that the sequences line up; each column of the alignment is then scored based on the following rules:
- If both nucleotides in the same column are identical, you gain \(+1\) point.
- If the nucleotides are different (i.e. a substitution mutation), you gain \(0\) points.
- If one sequence has a nucleotide and the other has a gap in a column, you receive a penalty of \(-2\) points.
Your task is: given two DNA sequences, find an alignment that maximizes the overall score, and output the maximum achievable score.
For example, consider the two sequences:
\(T_1 =\) "ATCAG" and \(T_2 =\) "ACTAG".
One possible alignment is:
T1: A T C - A G T2: A - C T A G
which gives a score of \(0\) (since the matched columns yield +1 and gaps lead to -2).
Another alignment is:
T1: A T C A G T2: A C T A G
In this alignment, aligning positions column-by-column results in a score of \(3\). This is the optimal score for these two sequences.
inputFormat
The input consists of two lines. The first line contains the first DNA sequence, and the second line contains the second DNA sequence. Both sequences are non-empty strings composed solely of the characters A
, T
, C
, and G
.
outputFormat
Output a single integer: the maximum alignment score that can be achieved by aligning the two DNA sequences according to the rules described above.
sample
ATCAG
ACTAG
3