Clustering

Given a set of 'n' points in space, the k-means clustering algorithm attempts to classify 'n' points into 'k' clusters such that the sum of distances of each point from the mean of it's respective cluster is minimized. Although this simulation showcases k-means clustering in 2D space, it is can be easily extended to further dimensions. Circles represent the points. Squares represent the (tentative) centroids of clusters. All points belonging to the same cluster share the same color.

Click on the screen to insert a point. Click on a point to remove it. Press 'Iterate' to progress the clustering process. Press 'Reset' to restart the clustering process anew. Press 'Clear' to remove all existing points

Algorithm

Psuedocode of k-means clustering algorithm is as follows:

Randomly insert k centroids in the space.
Assign each point to the nearest centroid. All points assigned to the same centroid form a cluster. It is imperative to verify that each centroid has atleast one point assigned to it. Otherwise, repeat the previous step.
Within each cluster, compute it's centroid, and move the assigned centroid to the computed position
Repeat the previous step until the cost function (sum of distances of each point from the centroid of it's respective cluster) stops decreasing further.

It is not guaranteed that the algorithm will converge to a global minimum. It might get stuck in a local minimum, in which case, the process must be restarted with a new set of initial centroids. The cost function quantifies how good the clustering is. A lower cost function signifies better clustering.

Note:

The metric used here is Euclidean distance
There is a bug that occurs with 6 clusters. Two of the centroids are of same color.
Related: Gradient Descent, Genetic Algorithm, Polynomial Regression.

Developed by ChanRT | Fork me at GitHub