#C13592. Data Preprocessing and K-Means Clustering

    ID: 43147 Type: Default 1000ms 256MiB

Data Preprocessing and K-Means Clustering

Data Preprocessing and K-Means Clustering

You are given a dataset containing information about cars. Each data point consists of three numerical attributes: horsepower, weight, and fuel efficiency. Your task is to perform data preprocessing and then cluster the data using the K-Means algorithm.

Data Preprocessing: For each attribute, perform standard scaling (i.e. transform each value using the formula $$x'=\frac{x-\mu}{\sigma}$$, where \(\mu\) is the mean and \(\sigma\) is the standard deviation; if \(\sigma=0\), consider it as 1).

Clustering: Initialize the cluster centers as the first \(k\) points from the preprocessed data. Then, run the K-Means algorithm for 10 iterations using the following steps:

  1. Assignment step: For each point, assign it to the cluster with the nearest center (using the Euclidean distance). In the event of a tie, choose the cluster with the smallest index.
  2. Update step: Update each cluster center to be the mean of all points assigned to that cluster. If a cluster receives no points, its center remains unchanged.

After clustering, output the labels for each data point and the computed cluster centers. Note that the labels are 0-indexed. Make sure to round each coordinate of the cluster centers to 4 decimal places.

inputFormat

The input is read from stdin. The first line contains an integer \(n\), the number of data points. The next \(n\) lines each contain three space-separated floating-point numbers representing horsepower, weight, and fuel efficiency, respectively. The last line contains an integer \(k\), the number of clusters.

outputFormat

Output to stdout as follows:

  • The first line should contain \(n\) space-separated integers representing the cluster labels (0-indexed) for each data point in the order of input.
  • The next \(k\) lines should each contain three space-separated numbers representing the coordinates of the cluster center, rounded to 4 decimal places.
## sample
3
130 3504 18
165 3693 15
150 3436 18
2
0 1 0

-0.5805 -0.6835 0.7070 1.1620 1.3670 -1.4140

</p>