#C14895. Social Media Trend Analysis
Social Media Trend Analysis
Social Media Trend Analysis
You are given a set of social media posts. Each post consists of a date and a text message. Your task is to preprocess the text and analyze trending topics from these posts. You need to implement techniques including text cleaning, hashtag extraction and topic analysis.
The preprocessing step involves removing URLs and special characters, and converting the text to lowercase. The hashtag extraction should find all occurrences of hashtags in the text. Finally, the trend analysis applies topic modelling using Latent Dirichlet Allocation (LDA). The LDA model assumes a generative process that can be summarized as: $$\theta \sim \mathrm{Dirichlet}(\alpha)$$.
In the competition problem, you are required to write a complete program that reads input from stdin
and writes the trending topics analysis to stdout
. In our test cases, the analysis will result in five topics labeled as Topic 0
to Topic 4
with fixed scores.
inputFormat
The input starts with an integer T (T ≥ 1) on the first line, representing the number of social media posts. Each of the following T lines contains a post in the format:
YYYY-MM-DD|post text
where the date is in YYYY-MM-DD
format, followed by a vertical bar and the post text. You should read the input from stdin
.
outputFormat
Your program should output five lines. Each line contains a topic label and its associated trending score, in the following format:
Topic 0: score Topic 1: score Topic 2: score Topic 3: score Topic 4: score
Output the result to stdout
. For the provided test cases the score for each topic is fixed at 3.
5
2023-10-01|Loving the #sunset today!
2023-10-01|Check out this amazing #sunset
2023-10-02|Wow, the #sunset is beautiful
2023-10-02|Loving the #sunset again #beautiful
2023-10-03|#sunset #beautiful moments
Topic 0: 3
Topic 1: 3
Topic 2: 3
Topic 3: 3
Topic 4: 3
</p>