#K37547. Top K Frequent Product Pairs

    ID: 26000 Type: Default 1000ms 256MiB

Top K Frequent Product Pairs

Top K Frequent Product Pairs

You are given T purchase sequences and an integer k. Each purchase sequence is a list of product IDs separated by spaces. Two products are considered to be bought together if they appear in the same sequence. Your task is to determine the top k product pairs that have the highest frequency of being purchased together.

For each purchase sequence, generate all unique product pairs (each pair should be sorted in lexicographical order). Then, aggregate the counts for each pair over all purchase sequences. In the case of ties in frequency, choose the pair with the lexicographically smaller string representation.

The output for each valid pair should be in the format:

product1 product2: count

Note: If there are fewer than k pairs available, output all the pairs.

Mathematically, if we denote the frequency of a pair as \(f(pair)\), the pairs should be sorted in descending order of \(f(pair)\) and, in the event of a tie, in ascending lexicographical order. Formally, for two pairs \(a\) and \(b\), \(a\) comes before \(b\) if either \(f(a) > f(b)\) or \(f(a) = f(b)\) and \(a < b\) in lexicographical order.

inputFormat

The input is given via standard input (stdin) and is formatted as follows:

  • The first line contains two integers, T and k, separated by a space. Here, T is the number of purchase sequences.
  • The following T lines each contain a purchase sequence - a string of product IDs separated by spaces.

outputFormat

The output should be printed to standard output (stdout). For each of the top k frequent product pairs, print a line in the format:

product1 product2: count

If there are no valid pairs, print nothing.

## sample
4 2
apple banana orange
apple banana
banana orange
apple orange banana
apple banana: 3

banana orange: 3

</p>