#P7086. Antibody Clustering

    ID: 20292 Type: Default 1000ms 256MiB

Antibody Clustering

Antibody Clustering

A group of biologists is trying to find a cure for a viral disease. They have selected \(n\) antibodies (each identified by its heavy chain, i.e. a sequence of amino acids) that seem to work best in experiments. In order to ease future research, they want to split these antibodies into a minimum number of similarity clusters.

A set of antibodies forms a similarity cluster if at least one of the following holds:

  • The \(k\)-prefixes (the first \(k\) amino acids) of all heavy chains in the cluster are equal; or
  • The \(k\)-suffixes (the last \(k\) amino acids) of all heavy chains in the cluster are equal.

Your task is to partition the \(n\) antibodies into a minimum number of similarity clusters. Additionally, you must output the grouping itself.

Note:

  • Each antibody must belong to exactly one cluster.
  • If an antibody can be assigned to a cluster by both its \(k\)-prefix and \(k\)-suffix, you may choose either option.
  • The clusters you form must be such that for every cluster, either all antibodies have the same \(k\)-prefix or all have the same \(k\)-suffix.

Hint: Think of each antibody as an edge connecting a prefix and a suffix in a bipartite graph. Then, a valid clustering corresponds to assigning every edge to one of its endpoints. Minimizing the number of clusters is equivalent to finding a minimum vertex cover in this bipartite graph, which by König's theorem equals the size of the maximum matching. Use this idea to design your solution.

inputFormat

The first line contains two integers \(n\) and \(k\) (\(1 \leq n \leq 10^5\); \(1 \leq k \leq L\)), where \(n\) is the number of antibodies and \(k\) is the length for the prefix/suffix to consider. Here, \(L\) denotes the length of each heavy chain string.

The following \(n\) lines each contain a non-empty string representing the heavy chain (a sequence of amino acids). It is guaranteed that each string has length at least \(k\). All strings consist of uppercase English letters.

outputFormat

On the first line, output a single integer \(r\) representing the minimum number of similarity clusters.

Then, output \(r\) lines. Each line describes a cluster: the first number is an integer \(m\) which is the number of antibodies in that cluster, followed by \(m\) space‐separated integers representing the 1-indexed positions of the antibodies (in their input order) that belong to the cluster.

sample

2 1
AB
CB
1

2 1 2

</p>