Top 5 Frequent Words

ID: 43512

Type: Default

1000ms

256MiB

Given a block of text provided via standard input, your task is to identify the top five most frequently occurring words after preprocessing. The preprocessing involves converting all letters to lowercase, removing any punctuation, splitting the text into individual words, and filtering out a predefined set of stop words.

More formally, let \( T \) be the input text. You must transform \( T \) as follows:

Convert \( T \) to lowercase.
Remove all punctuation characters.
Split the text on whitespace to obtain a list of words \( W \).
Remove all words from \( W \) that appear in the stop words set \( S \).

After preprocessing, count the occurrences of each word. Then, output the top five words as pairs \((word, frequency)\) in descending order of frequency. If two words have the same frequency, retain the order in which they first appear in the text. If there are fewer than five words, output all of them.

Note: The stop words list is predefined (see source code for complete set) and is omitted from the input. All input is given via standard input, and the results must be printed to standard output.

inputFormat

The input consists of a block of text (which can span multiple lines) received via standard input. The text may be empty.

outputFormat

Output up to five lines. Each line should contain a word and its frequency, separated by a space, reflecting the top five most frequent words after preprocessing. If no words remain after preprocessing, output nothing.

## sample

#C13920. Top 5 Frequent Words

Top 5 Frequent Words