Counting Common DNA Substrings

ID: 21809

Type: Default

1000ms

256MiB

Biologists are studying n species, each with a DNA sequence. For the i^th species the DNA sequence is s[i], and its j^th base is s[i][j] (each base is one of A, G, C, T). To find some common features among the species, scientists are interested in all contiguous DNA subsequences of length k that appear in at least m species.

More formally, a contiguous subsequence is represented by a 2m-tuple \[ (i_1, p_1, i_2, p_2, \ldots, i_m, p_m) \] which satisfies the following conditions:

\(1 \le i_1 < i_2 < \cdots < i_m \le n\)
For every offset \(q\) with \(0 \le q < k\), the equality \[ s[i_1][p_1+q] = s[i_2][p_2+q] = \cdots = s[i_m][p_m+q] \] holds, meaning that the same contiguous DNA string of length \(k\) appears in all the chosen species at positions \(p_1, p_2, \ldots, p_m\) respectively.

Your task is to compute how many such 2m-tuples exist. Two tuples are considered different if any of the positions in the tuple differs.

Note: The contiguous subsequence in each species can start at any valid position as long as the entire length \(k\) fits within the sequence.

inputFormat

The first line contains three integers \(n\), \(k\), and \(m\) separated by spaces.

Then \(n\) lines follow, where the \(i\)-th line contains the DNA sequence of the \(i\)-th species. Each sequence is a non-empty string consisting only of the characters A, G, C, and T.

It is guaranteed that each DNA sequence has length at least \(k\).

outputFormat

Output a single integer representing the number of valid 2m-tuples.

sample

2 2 2
AGCT
TGCA

</p>

#P8643. Counting Common DNA Substrings

Counting Common DNA Substrings