#C12261. Simple Linear Regression on CSV Data

    ID: 41669 Type: Default 1000ms 256MiB

Simple Linear Regression on CSV Data

Simple Linear Regression on CSV Data

Given a CSV file as input, where the first line contains the header feature,target and each subsequent line contains a data point, your task is to train a simple linear regression model using the provided data.

You must split the data into a training set and a test set using the following rule: the training set consists of the first \(\lfloor0.75 \times n\rfloor\) rows (excluding the header) and the test set consists of the remaining rows, where \(n\) is the total number of data rows.

Use the training data to compute the regression coefficients with the formulas:

  • \(b = \frac{\operatorname{Cov}(x, y)}{\operatorname{Var}(x)}\)
  • \(a = \bar{y} - b\bar{x}\)

Here, \(\bar{x}\) and \(\bar{y}\) denote the means of the training features and targets, \(\operatorname{Cov}(x, y) = \frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})\) and \(\operatorname{Var}(x) = \frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})^2\).

Then, for each test sample, predict the target value using the equation:

\(y_{pred} = a + bx\)

and compute the Mean Squared Error (MSE) as follows:

\(\text{MSE} = \frac{1}{m}\sum_{i=1}^{m}(y_{pred} - y_{true})^2\), where \(m\) is the number of test samples.

Output the MSE with at least 6 decimal places.

inputFormat

The input is provided via standard input in CSV format. The first line is the header "feature,target". Each subsequent line contains two comma-separated numbers representing a data point corresponding to the feature and target, respectively.

outputFormat

Output the Mean Squared Error (MSE) as a floating point number printed to standard output with at least 6 decimal places.## sample

feature,target
1,2
2,3
3,5
4,4
5.444444

</p>