Counting Distinct k-Fragments in a Genome

ID: 11392

Type: Default

1000ms

256MiB

The turtle has obtained his genome, which is a string consisting solely of the four characters A, T, C, and G. He recalls that scientists have noted that many fragments in a genome are repeated several times, and such repetitions are significant. In this problem, a substring of length \(k\) is called a \(k\)-fragment. Your task is to compute the number of distinct \(k\)-fragments in his genome.

The genome is not given explicitly. Instead, you have to generate it using a pseudo‐random sequence \(R\) defined as follows:

Given an integer seed \(R_1\) satisfying \(0 \le R_1 1\) the sequence is generated according to \[ R_i = (R_{i-1} \times 6807 + 2831) \mod 201701, \] and the genome of length \(n\) is constructed by mapping each \(R_i\) into a nucleotide according to its residue modulo 4 as follows:

If \(R_i \bmod 4 = 0\), the \(i\)th character is A.
If \(R_i \bmod 4 = 1\), the \(i\)th character is T.
If \(R_i \bmod 4 = 2\), the \(i\)th character is C.
If \(R_i \bmod 4 = 3\), the \(i\)th character is G.

You are given two integers \(n\) and \(k\) (with \(1 \le k \le n\)) on the first line and the integer seed \(R_1\) on the second line. Output the number of distinct \(k\)-fragments (i.e. distinct substrings of length \(k\)) in the generated genome. If \(k\) is greater than \(n\), output 0.

inputFormat

The first line contains two integers \(n\) and \(k\) separated by a space, where \(n\) is the length of the genome and \(k\) is the length of the fragment to consider. The second line contains an integer \(R_1\) (\(0 \le R_1 < 201701\)), which is the seed for generating the sequence \(R\).

The genome string is generated as follows: for each \(i\) (\(1 \le i \le n\)), compute \(R_i\) using

[ R_1 \text{ is given, and for } i > 1, ; R_i = (R_{i-1} \times 6807 + 2831) \mod 201701. ]

Then, the \(i\)th character of the genome is determined by the value of \(R_i \bmod 4\) using the mapping:

0 \(\to\) A
1 \(\to\) T
2 \(\to\) C
3 \(\to\) G

outputFormat

Output a single integer, which is the number of distinct \(k\)-fragments (i.e. substrings of length \(k\)) in the generated genome. If \(k > n\), output 0.

sample

5 2
0

#B3733. Counting Distinct k-Fragments in a Genome

Counting Distinct k-Fragments in a Genome