#C14553. Recommender System Components Implementation

    ID: 44215 Type: Default 1000ms 256MiB

Recommender System Components Implementation

Recommender System Components Implementation

You are given a fixed dataset representing user ratings on items. The dataset contains 10 records with three fields: user_id, item_id, and rating. The data is as follows (each record on one line):

1 101 5
2 101 4
3 101 4
4 102 3
5 102 5
1 103 4
2 103 2
3 104 5
4 105 3
5 105 4

You are required to implement three functionalities in one program:

  • EDA: Perform exploratory data analysis on the whole dataset. You should compute and print:
    • The rating distribution (i.e. the frequency of each rating). Print in descending order of rating values in the format: Ratings distribution: 5:count 4:count ...
    • The number of unique users.
    • The number of unique items.
    • Missing values in each column (for this dataset, there are no missing values, so print 0 for each column). Format: Missing values: user_id:0 item_id:0 rating:0
  • CF (Collaborative Filtering): Use a fixed train-test split where the first 8 records form the training set and the last 2 form the testing set. For each test record, predict the rating as the average rating of that user in the training set (if the user exists in training). Then, compute the evaluation metrics:
    • Root Mean Squared Error (RMSE) defined as \(\sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i-\hat{y}_i)^2}\)
    • Mean Absolute Error (MAE) defined as \(\frac{1}{n}\sum_{i=1}^{n}|y_i-\hat{y}_i|\)
    Print the RMSE and MAE separated by a space. (For the given dataset the expected results are approximately: 0.707107 0.5.)
  • CBF (Content-Based Filtering): You are given a new user profile which contains a preferred genre and an item metadata list. The metadata is provided as \(item\_id: genre\). Recommend all items whose genre matches the new user’s genre. Output the recommended item IDs in ascending order separated by a space.

Input Format:

  • The first line contains a command which is one of the following: EDA, CF, or CBF.
  • If the command is CBF, then the next line contains a string representing the new user’s genre. The following line contains an integer \(N\) (number of item metadata entries). Then \(N\) lines follow, each containing an integer and a string separated by space representing item_id and its genre.
  • For EDA and CF commands, no additional input is provided.

Output Format:

  • For the EDA command, output 4 lines:
    Line 1: Ratings distribution: 5:<count> 4:<count> 3:<count> 2:<count>
    Line 2: Unique users: <number>
    Line 3: Unique items: <number>
    Line 4: Missing values: user_id:0 item_id:0 rating:0
  • For the CF command, output one line with the RMSE and MAE separated by a space. Print RMSE to 6 decimal places.
  • For the CBF command, output one line with the recommended item IDs in ascending order separated by a space.

Sample Test Cases:

  1. Input:
    EDA
    
    Output:
    Ratings distribution: 5:3 4:4 3:2 2:1
    Unique users: 5
    Unique items: 5
    Missing values: user_id:0 item_id:0 rating:0
    
  2. Input:
    CF
    
    Output:
    0.707107 0.5
    
  3. Input:
    CBF
    Adventure
    5
    101 Adventure
    102 Horror
    103 Comedy
    104 Sci-Fi
    105 Adventure
    
    Output:
    101 105
    

inputFormat

The input begins with a command (EDA, CF, or CBF).

If the command is EDA or CF, no additional input follows. If the command is CBF, then:

  • The next line contains a string denoting the new user's preferred genre.
  • The following line contains an integer \(N\) indicating the number of metadata entries.
  • The next \(N\) lines each contain an integer and a string (separated by space) representing an item ID and its genre.
  • outputFormat

    For EDA: Print four lines as described in the problem statement.

    For CF: Print one line containing two numbers: RMSE (to 6 decimal places) and MAE, separated by a space.

    For CBF: Print one line with the recommended item IDs (in ascending order) separated by a space.

    ## sample
    EDA
    
    Ratings distribution: 5:3 4:4 3:2 2:1
    

    Unique users: 5 Unique items: 5 Missing values: user_id:0 item_id:0 rating:0

    </p>