#C13470. Weather Forecasting with Feature Engineering

    ID: 43012 Type: Default 1000ms 256MiB

Weather Forecasting with Feature Engineering

Weather Forecasting with Feature Engineering

You are provided with a time‐series weather dataset containing daily observations. The dataset includes two fields: a date in the format \(YYYY\text{-}MM\text{-}DD\) and a temperature value. Your task is to build a pipeline that processes this data and then creates a predictive model to forecast the temperature.

The pipeline consists of the following steps:

  1. Preprocessing: Sort the data by date. Handle any missing values by forward filling. Convert the date strings to a proper date format.
  2. Feature Engineering: Calculate two features from the temperature data:
    • Rolling Average: For each day (starting from the 7th record), compute the average temperature over the current day and the previous 6 days. (Window size = 7)
    • Moving Fluctuation: Starting from the second record, compute the absolute difference between the current day’s temperature and the previous day’s temperature.
    Use only the rows where both features are available (i.e. starting from the 8th record, index 7 in 0-indexing).
  3. Model Building: Using the engineered features as predictors, build a linear regression model to predict the temperature. Split the data into a training set and a test set by using the first 80% of the records for training and the remaining 20% for testing.
  4. Evaluation: Evaluate the model’s performance on the test set by computing the Mean Squared Error (MSE) and Mean Absolute Error (MAE).

Technical details: The predictive model is defined as \[ y = \beta_0 + \beta_1 \times (\text{rolling average}) + \beta_2 \times (\text{moving fluctuation}) \] The coefficients \(\beta_0, \beta_1, \beta_2\) should be calculated using the normal equation.

Input/Output: Your program should read from standard input (stdin) and write to standard output (stdout). The input format is described below.

inputFormat

The first line contains an integer \(N\) representing the number of records in the dataset. The following \(N\) lines each contain a date and a temperature (floating-point number) separated by a space.

Example:

15
2021-01-01 21
2021-01-02 22
... (and so on for 15 lines)

outputFormat

Output two floating-point numbers representing the Mean Squared Error (MSE) and the Mean Absolute Error (MAE) of your predictive model on the test set. The values should be printed on a single line separated by a space, each rounded to two decimal places.

Example:

0.00 0.00
## sample
15
2021-01-01 21
2021-01-02 22
2021-01-03 23
2021-01-04 24
2021-01-05 25
2021-01-06 26
2021-01-07 27
2021-01-08 28
2021-01-09 29
2021-01-10 30
2021-01-11 31
2021-01-12 32
2021-01-13 33
2021-01-14 34
2021-01-15 35
0.00 0.00