#C12938. House Price Prediction Pipeline
House Price Prediction Pipeline
House Price Prediction Pipeline
You are given a CSV dataset containing house data with the following columns: Rooms
, Type
, Distance
, and Price
. Your task is to preprocess the data by handling missing values and encoding categorical features using one‐hot encoding. Then, split the dataset into training and testing sets (using an 80/20 split with a fixed random_state of 0), train a Linear Regression model on the training data, and finally predict the house prices on the test set. The predictions should be output as CSV to standard output with a header PredictedPrice
. All mathematical formulas should be expressed in LaTeX format. For example, the normal equation in linear regression is given by:
\[
\beta = (X^T X)^{-1} X^T y
\]
Your solution must read the input CSV via stdin and output the CSV (with the predicted prices) to stdout.
inputFormat
The input is a CSV formatted text provided via standard input. The first line is the header: Rooms,Type,Distance,Price
. Each subsequent line corresponds to one record of data.
outputFormat
The output should be a CSV text printed to standard output. It must contain a header line with PredictedPrice
and then one line per test case record, listing the predicted price as a floating-point number.
Rooms,Type,Distance,Price
1,house,5.5,310000
2,unit,8.2,450000
3,house,12.7,540000
2,house,7.3,320000
3,unit,3.0,520000
PredictedPrice
299142.85714285716