#C228. Top 10 Word Frequencies
Top 10 Word Frequencies
Top 10 Word Frequencies
You are given a text input via standard input. Your task is to normalize the text by converting all letters to lowercase and removing all punctuation characters, then split the text into words. After counting the frequency of each word, output the top 10 most frequent words along with their frequencies.
The output should list each word and its corresponding frequency on a separate line, separated by a space. If there are fewer than 10 distinct words, output all of them. In case two words have the same frequency, they should be ordered in lexicographical (alphabetical) order.
This problem requires processing potentially large inputs in a memory efficient manner. Use efficient techniques to process the text.
Note: The normalization process must remove punctuation. Mathematically, if f(w) is the frequency of word w, then the ordering is determined by descending f(w), and for words with equal frequencies, by ascending lexicographical order. Formally, the sorting order is defined by:
$$\text{order}(w_i, w_j)=\begin{cases}-\bigl(f(w_i)-f(w_j)\bigr) & \text{if } f(w_i)\neq f(w_j)\\ \text{lex}(w_i, w_j) & \text{if } f(w_i)=f(w_j) \end{cases} $$inputFormat
The input is provided via standard input (stdin) and consists of multiple lines of text. The entire input should be processed as one text block.
outputFormat
Output to standard output (stdout) the top 10 words along with their frequencies. Each line should contain a word and its frequency separated by a space.
## sampleHello, world! Hello.
hello 2
world 1
</p>