#B3733. Counting Distinct k-Fragments in a Genome
Counting Distinct k-Fragments in a Genome
Counting Distinct k-Fragments in a Genome
The turtle has obtained his genome, which is a string consisting solely of the four characters A
, T
, C
, and G
. He recalls that scientists have noted that many fragments in a genome are repeated several times, and such repetitions are significant. In this problem, a substring of length \(k\) is called a \(k\)-fragment. Your task is to compute the number of distinct \(k\)-fragments in his genome.
The genome is not given explicitly. Instead, you have to generate it using a pseudo‐random sequence \(R\) defined as follows:
Given an integer seed \(R_1\) satisfying \(0 \le R_1 1\) the sequence is generated according to \[ R_i = (R_{i-1} \times 6807 + 2831) \mod 201701, \] and the genome of length \(n\) is constructed by mapping each \(R_i\) into a nucleotide according to its residue modulo 4 as follows:
- If \(R_i \bmod 4 = 0\), the \(i\)th character is
A
. - If \(R_i \bmod 4 = 1\), the \(i\)th character is
T
. - If \(R_i \bmod 4 = 2\), the \(i\)th character is
C
. - If \(R_i \bmod 4 = 3\), the \(i\)th character is
G
.
You are given two integers \(n\) and \(k\) (with \(1 \le k \le n\)) on the first line and the integer seed \(R_1\) on the second line. Output the number of distinct \(k\)-fragments (i.e. distinct substrings of length \(k\)) in the generated genome. If \(k\) is greater than \(n\), output 0.
inputFormat
The first line contains two integers \(n\) and \(k\) separated by a space, where \(n\) is the length of the genome and \(k\) is the length of the fragment to consider. The second line contains an integer \(R_1\) (\(0 \le R_1 < 201701\)), which is the seed for generating the sequence \(R\).
The genome string is generated as follows: for each \(i\) (\(1 \le i \le n\)), compute \(R_i\) using
[ R_1 \text{ is given, and for } i > 1, ; R_i = (R_{i-1} \times 6807 + 2831) \mod 201701. ]
Then, the \(i\)th character of the genome is determined by the value of \(R_i \bmod 4\) using the mapping:
- 0 \(\to\) A
- 1 \(\to\) T
- 2 \(\to\) C
- 3 \(\to\) G
outputFormat
Output a single integer, which is the number of distinct \(k\)-fragments (i.e. substrings of length \(k\)) in the generated genome. If \(k > n\), output 0.
sample
5 2
0
2