#C13592. Data Preprocessing and K-Means Clustering
Data Preprocessing and K-Means Clustering
Data Preprocessing and K-Means Clustering
You are given a dataset containing information about cars. Each data point consists of three numerical attributes: horsepower, weight, and fuel efficiency. Your task is to perform data preprocessing and then cluster the data using the K-Means algorithm.
Data Preprocessing: For each attribute, perform standard scaling (i.e. transform each value using the formula $$x'=\frac{x-\mu}{\sigma}$$, where \(\mu\) is the mean and \(\sigma\) is the standard deviation; if \(\sigma=0\), consider it as 1).
Clustering: Initialize the cluster centers as the first \(k\) points from the preprocessed data. Then, run the K-Means algorithm for 10 iterations using the following steps:
- Assignment step: For each point, assign it to the cluster with the nearest center (using the Euclidean distance). In the event of a tie, choose the cluster with the smallest index.
- Update step: Update each cluster center to be the mean of all points assigned to that cluster. If a cluster receives no points, its center remains unchanged.
After clustering, output the labels for each data point and the computed cluster centers. Note that the labels are 0-indexed. Make sure to round each coordinate of the cluster centers to 4 decimal places.
inputFormat
The input is read from stdin. The first line contains an integer \(n\), the number of data points. The next \(n\) lines each contain three space-separated floating-point numbers representing horsepower, weight, and fuel efficiency, respectively. The last line contains an integer \(k\), the number of clusters.
outputFormat
Output to stdout as follows:
- The first line should contain \(n\) space-separated integers representing the cluster labels (0-indexed) for each data point in the order of input.
- The next \(k\) lines should each contain three space-separated numbers representing the coordinates of the cluster center, rounded to 4 decimal places.
3
130 3504 18
165 3693 15
150 3436 18
2
0 1 0
-0.5805 -0.6835 0.7070
1.1620 1.3670 -1.4140
</p>