#C12938. House Price Prediction Pipeline

    ID: 42420 Type: Default 1000ms 256MiB

House Price Prediction Pipeline

House Price Prediction Pipeline

You are given a CSV dataset containing house data with the following columns: Rooms, Type, Distance, and Price. Your task is to preprocess the data by handling missing values and encoding categorical features using one‐hot encoding. Then, split the dataset into training and testing sets (using an 80/20 split with a fixed random_state of 0), train a Linear Regression model on the training data, and finally predict the house prices on the test set. The predictions should be output as CSV to standard output with a header PredictedPrice. All mathematical formulas should be expressed in LaTeX format. For example, the normal equation in linear regression is given by: \[ \beta = (X^T X)^{-1} X^T y \]

Your solution must read the input CSV via stdin and output the CSV (with the predicted prices) to stdout.

inputFormat

The input is a CSV formatted text provided via standard input. The first line is the header: Rooms,Type,Distance,Price. Each subsequent line corresponds to one record of data.

outputFormat

The output should be a CSV text printed to standard output. It must contain a header line with PredictedPrice and then one line per test case record, listing the predicted price as a floating-point number.

## sample
Rooms,Type,Distance,Price
1,house,5.5,310000
2,unit,8.2,450000
3,house,12.7,540000
2,house,7.3,320000
3,unit,3.0,520000
PredictedPrice

299142.85714285716