Top 10 Word Frequencies

ID: 45578

Type: Default

1000ms

256MiB

You are given a text input via standard input. Your task is to normalize the text by converting all letters to lowercase and removing all punctuation characters, then split the text into words. After counting the frequency of each word, output the top 10 most frequent words along with their frequencies.

The output should list each word and its corresponding frequency on a separate line, separated by a space. If there are fewer than 10 distinct words, output all of them. In case two words have the same frequency, they should be ordered in lexicographical (alphabetical) order.

This problem requires processing potentially large inputs in a memory efficient manner. Use efficient techniques to process the text.

Note: The normalization process must remove punctuation. Mathematically, if f(w) is the frequency of word w, then the ordering is determined by descending f(w), and for words with equal frequencies, by ascending lexicographical order. Formally, the sorting order is defined by:

$$\text{order}(w_i, w_j)=\begin{cases}-\bigl(f(w_i)-f(w_j)\bigr) & \text{if } f(w_i)\neq f(w_j)\\ \text{lex}(w_i, w_j) & \text{if } f(w_i)=f(w_j) \end{cases} $$

inputFormat

The input is provided via standard input (stdin) and consists of multiple lines of text. The entire input should be processed as one text block.

outputFormat

Output to standard output (stdout) the top 10 words along with their frequencies. Each line should contain a word and its frequency separated by a space.

## sample

Hello, world! Hello.

hello 2
world 1

</p>

#C228. Top 10 Word Frequencies

Top 10 Word Frequencies