#C13916. KMeans Clustering with Accuracy Evaluation
KMeans Clustering with Accuracy Evaluation
KMeans Clustering with Accuracy Evaluation
In this problem, you are required to implement the KMeans clustering algorithm from scratch. Given a dataset of (n) samples with (d) features and an integer (k) representing the number of clusters, partition the dataset into (k) clusters by minimizing the within-cluster sum of squared Euclidean distances. Initialize the centroids randomly (using a fixed random seed of 42) by selecting (k) unique samples from the dataset and repeatedly update the centroids until convergence is reached (i.e. when the change in centroids for all clusters is less than (\varepsilon = 10^{-4})) or a maximum of 300 iterations is completed.
After clustering, for each cluster assign a label based on the mode (most frequent true label) among the samples assigned to that cluster. Compute the clustering accuracy as the fraction of samples for which the mapped predicted label matches the true label.
Your program should read input from STDIN and output the clustering accuracy on STDOUT in the format "Accuracy: x.xx", where x.xx is the accuracy rounded to two decimal places.
inputFormat
The input is given via STDIN and has the following format:
- The first line contains two integers (n) and (d), where (n) is the number of samples and (d) is the number of features per sample.
- The second line contains an integer (k) representing the number of clusters.
- The next (n) lines each contain (d) space-separated real numbers representing the features of each sample.
- The last line contains (n) space-separated integers representing the true labels of the samples.
outputFormat
Output a single line to STDOUT in the format "Accuracy: x.xx", where x.xx is the clustering accuracy computed by mapping each predicted cluster to the most frequent true label in that cluster. The accuracy should be rounded to two decimal places.## sample
6 2
2
1.0 2.0
1.5 1.8
5.0 8.0
6.0 8.0
1.2 1.9
5.5 7.5
0 0 1 1 0 1
Accuracy: 1.00