#C14500. House Prices: Linear Regression Analysis

    ID: 44157 Type: Default 1000ms 256MiB

House Prices: Linear Regression Analysis

House Prices: Linear Regression Analysis

In this problem, you are provided with a dataset containing housing information in CSV format. Your task is to perform data cleaning by filling missing numerical values with their mean and missing categorical values with their mode. Then, apply one‐hot encoding to any categorical features. Next, split the dataset into training and testing sets with an 80-20 split. Using the training set, train a linear regression model to predict house prices (the SalePrice column). Finally, evaluate the model performance by computing the Mean Squared Error (MSE) and the R-squared score. The formulas are given in \( \text{MSE} = \frac{1}{n}\sum_{i=1}^{n}(y_i-\hat{y}_i)^2 \) and \( R^2 = 1 - \frac{\sum_{i=1}^{n}(y_i-\hat{y}_i)^2}{\sum_{i=1}^{n}(y_i-\bar{y})^2} \).

inputFormat

The input is provided via standard input (stdin) as CSV formatted text. The first line contains the header with column names and each subsequent line is a data record. The target column is always 'SalePrice'.

outputFormat

Output two space‐separated floating point numbers: the first is the Mean Squared Error (MSE) and the second is the R-squared score, printed to standard output (stdout). For example: 0.0 1.0.## sample

OverallQual,GrLivArea,GarageCars,GarageArea,TotalBsmtSF,1stFlrSF,FullBath,YearBuilt,SalePrice
7,1710,2,548,856,856,2,2003,208500
6,1262,2,460,1262,1262,2,1976,181500
7,1786,2,608,920,920,2,2001,223500
7,1717,3,642,756,756,1,1915,140000
8,2198,3,836,1145,1145,2,2000,250000
0.0 1.0