Implement K-Means Clustering with Metrics

ID: 42790

Type: Default

1000ms

256MiB

In this problem, you are required to implement the K-Means clustering algorithm from scratch along with two important clustering metrics: the silhouette score and the Within-Cluster Sum of Squares (WCSS). Given a set of data points in a d-dimensional space and an integer k, you must cluster the points into k clusters.

You are to create a KMeans class with the following methods:

fit(): Partition the input data points into k clusters using the iterative k-means algorithm. Use a maximum of 300 iterations and stop early if the centroid movement is less than a tolerance of \(10^{-4}\).
predict(): Given new data points, assign each point to its closest cluster centroid.

You must also implement two functions:

calculate_silhouette_score(): Compute the average silhouette score of the clustering. The silhouette score for a single point is defined as \(s = \frac{b - a}{\max(a, b)}\), where \(a\) is the average distance to all other points in the same cluster and \(b\) is the minimum average distance to points in any other cluster. If a cluster contains only a single point, treat its silhouette score as 0.
calculate_wcss(): Compute the sum of squared distances from each point to the centroid of its corresponding cluster.

Input/Output:

The program must read from standard input and write to standard output.

inputFormat

The input is given via standard input in the following format:

n d k
x[0][0] x[0][1] ... x[0][d-1]
...
x[n-1][0] x[n-1][1] ... x[n-1][d-1]
m
q[0][0] q[0][1] ... q[0][d-1]
...
q[m-1][0] q[m-1][1] ... q[m-1][d-1]

Where:

n is the number of data points, d is the number of dimensions, and k is the number of clusters.
The next n lines each contain d floating-point numbers representing a data point.
m is the number of query points.
The next m lines each contain d floating-point numbers representing a query point.

outputFormat

The output should be printed to standard output in the following format:

Note: The cluster indices range from 0 to k-1.

## sample

0.9213
1.0000
0 1

</p>

#C13270. Implement K-Means Clustering with Metrics

Implement K-Means Clustering with Metrics