#P4235. Expected Distinct Hash Count
Expected Distinct Hash Count
Expected Distinct Hash Count
You are given a set of n strings and an integer parameter x. For each string s, we define a set of hash values computed by a randomized polynomial hash function:
For every integer r in [0, x), the hash value of s is computed as
[ H(s, r)=\Bigl(\sum_{i=1}^{|s|} (s_i-\mathtt{'a'}+1) \cdot r^{|s|-i} \Bigr) \bmod 65537, ]
where the modulus 65537 = 2^{16}+1
is prime. Note that when r = 0
, the hash reduces to the value of the last character (converted to a number in the range 1 to 26).
For each string, a random integer r
is chosen uniformly at random from [0, x)
(using a procedure analogous to Rand(x)
), and then the hash H(s, r)
is computed. The ans
is defined as the number of distinct hash values produced over all the strings.
Your task is to compute the expected value of ans
over the randomness of the independent choices of r
for each string.
Hint: For each possible hash value v in the union of all possible outcomes, the probability that v is not produced by a given string s is \[ \begin{cases} 1-\frac{1}{x} & \text{if } v \in \{H(s, r): 0\le r<x\}\\ 1 & \text{otherwise.} \end{cases} \] Thus, the expected contribution of v is \[ 1-\prod_{s: v \in \{H(s, r)\}}\Bigl(1-\frac{1}{x}\Bigr). \] Summing over all v gives the final expected number of distinct hash values.
inputFormat
The input consists of the following:
- A line containing two integers
n
andx
(1 ≤ n, x ≤ reasonable limits), wheren
is the number of strings andx
is the upper bound for the random base. - Then
n
lines follow, each containing a non-empty string consisting of lowercase English letters.
Note: Although in the sample code the constant x
is fixed to 10, in this problem, x
is given as part of the input.
outputFormat
Output the expected value of the distinct hash count (ans
) as a floating point number. The answer will be considered correct if it has an absolute or relative error of at most 10-6.
sample
3 10
a
b
c
0.300000