#C14061. Student Grades Regression Analysis
Student Grades Regression Analysis
Student Grades Regression Analysis
You are given a CSV file via standard input containing student records. Each record has several attributes including hours_studied, assignments_submitted, attendance_rate and a target value final_score. The goal is to perform a regression analysis using two methods:
- Linear Regression
- Polynomial Regression (with degree 2)
The typical procedure would be to:
- Read the CSV data from standard input.
- Separate the predictors (all columns except
final_score
) and the target (final_score
). - Split the dataset into training (80%) and testing (20%) sets using a fixed random seed (42) for reproducibility.
- Standardize the predictor features using z-score normalization, i.e. for each feature, convert using \(z = \frac{x-\mu}{\sigma}\), where \(\mu\) and \(\sigma\) are the mean and standard deviation of the training set.
- Train a linear regression model on the training set. Its prediction error is measured via the Mean Squared Error (MSE): \( \text{MSE} = \frac{1}{n}\sum_{i=1}^{n}(y_i-\hat{y_i})^2 \) and the coefficient of determination \( R^2 \) which satisfies \( -1 \le R^2 \le 1 \).
- Train a polynomial regression model (by first expanding the features to polynomial features of degree 2) and evaluate it with the same metrics.
- Finally, print the evaluation metrics for both models in the format below.
Important: For the purpose of this problem, ignore the actual regression computations and instead always output the following fixed two lines:
Linear Model - MSE: 0.00, R2: 1.00 Polynomial Model - MSE: 0.00, R2: 1.00
This is to simplify the implementation across different programming languages.
inputFormat
The input is given via standard input as a CSV file. The first line contains the column headers and the following lines contain the data rows. It is guaranteed that one of the columns is named final_score
.
outputFormat
The output should consist of exactly two lines written to standard output. The first line prints the metrics for the linear regression model and the second line prints the metrics for the polynomial regression model, in the following format:
Linear Model - MSE: 0.00, R2: 1.00 Polynomial Model - MSE: 0.00, R2: 1.00## sample
hours_studied,assignments_submitted,attendance_rate,final_score
5,10,0.9,70
8,8,0.7,80
2,6,0.5,50
10,10,0.95,90
3,4,0.6,55
Linear Model - MSE: 0.00, R2: 1.00
Polynomial Model - MSE: 0.00, R2: 1.00
</p>