#C14553. Recommender System Components Implementation
Recommender System Components Implementation
Recommender System Components Implementation
You are given a fixed dataset representing user ratings on items. The dataset contains 10 records with three fields: user_id
, item_id
, and rating
. The data is as follows (each record on one line):
1 101 5 2 101 4 3 101 4 4 102 3 5 102 5 1 103 4 2 103 2 3 104 5 4 105 3 5 105 4
You are required to implement three functionalities in one program:
- EDA: Perform exploratory data analysis on the whole dataset. You should compute and print:
- The rating distribution (i.e. the frequency of each rating). Print in descending order of rating values in the format:
Ratings distribution: 5:count 4:count ...
- The number of unique users.
- The number of unique items.
- Missing values in each column (for this dataset, there are no missing values, so print 0 for each column). Format:
Missing values: user_id:0 item_id:0 rating:0
- The rating distribution (i.e. the frequency of each rating). Print in descending order of rating values in the format:
- CF (Collaborative Filtering): Use a fixed train-test split where the first 8 records form the training set and the last 2 form the testing set. For each test record, predict the rating as the average rating of that user in the training set (if the user exists in training). Then, compute the evaluation metrics:
- Root Mean Squared Error (RMSE) defined as \(\sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i-\hat{y}_i)^2}\)
- Mean Absolute Error (MAE) defined as \(\frac{1}{n}\sum_{i=1}^{n}|y_i-\hat{y}_i|\)
0.707107 0.5
.) - CBF (Content-Based Filtering): You are given a new user profile which contains a preferred genre and an item metadata list. The metadata is provided as \(item\_id: genre\). Recommend all items whose genre matches the new user’s genre. Output the recommended item IDs in ascending order separated by a space.
Input Format:
- The first line contains a command which is one of the following:
EDA
,CF
, orCBF
. - If the command is
CBF
, then the next line contains a string representing the new user’s genre. The following line contains an integer \(N\) (number of item metadata entries). Then \(N\) lines follow, each containing an integer and a string separated by space representingitem_id
and itsgenre
. - For
EDA
andCF
commands, no additional input is provided.
Output Format:
- For the
EDA
command, output 4 lines:
Line 1:Ratings distribution: 5:<count> 4:<count> 3:<count> 2:<count>
Line 2:Unique users: <number>
Line 3:Unique items: <number>
Line 4:Missing values: user_id:0 item_id:0 rating:0
- For the
CF
command, output one line with the RMSE and MAE separated by a space. Print RMSE to 6 decimal places. - For the
CBF
command, output one line with the recommended item IDs in ascending order separated by a space.
Sample Test Cases:
-
Input:
EDA
Output:Ratings distribution: 5:3 4:4 3:2 2:1 Unique users: 5 Unique items: 5 Missing values: user_id:0 item_id:0 rating:0
-
Input:
CF
Output:0.707107 0.5
-
Input:
CBF Adventure 5 101 Adventure 102 Horror 103 Comedy 104 Sci-Fi 105 Adventure
Output:101 105
inputFormat
The input begins with a command (EDA
, CF
, or CBF
).
If the command is EDA
or CF
, no additional input follows. If the command is CBF
, then:
- The next line contains a string denoting the new user's preferred genre.
- The following line contains an integer \(N\) indicating the number of metadata entries.
- The next \(N\) lines each contain an integer and a string (separated by space) representing an item ID and its genre.
outputFormat
For EDA
: Print four lines as described in the problem statement.
For CF
: Print one line containing two numbers: RMSE (to 6 decimal places) and MAE, separated by a space.
For CBF
: Print one line with the recommended item IDs (in ascending order) separated by a space.
EDA
Ratings distribution: 5:3 4:4 3:2 2:1
Unique users: 5
Unique items: 5
Missing values: user_id:0 item_id:0 rating:0
</p>