Clustering
Given a set of 'n' points in space, the k-means clustering algorithm attempts to classify 'n' points into 'k' clusters such that the sum of distances of each point from the mean of it's respective cluster is minimized. Although this simulation showcases k-means clustering in 2D space, it is can be easily extended to further dimensions. Circles represent the points. Squares represent the (tentative) centroids of clusters. All points belonging to the same cluster share the same color.
Click on the screen to insert a point. Click on a point to remove it. Press 'Iterate' to progress the clustering process. Press 'Reset' to restart the clustering process anew. Press 'Clear' to remove all existing points
Algorithm
Psuedocode of k-means clustering algorithm is as follows:
- Randomly insert k centroids in the space.
- Assign each point to the nearest centroid. All points assigned to the same centroid form a cluster. It is imperative to verify that each centroid has atleast one point assigned to it. Otherwise, repeat the previous step.
- Within each cluster, compute it's centroid, and move the assigned centroid to the computed position
- Repeat the previous step until the cost function (sum of distances of each point from the centroid of it's respective cluster) stops decreasing further.
Note:
- The metric used here is Euclidean distance
- There is a bug that occurs with 6 clusters. Two of the centroids are of same color.
- Related: Gradient Descent, Genetic Algorithm, Polynomial Regression.
Developed by ChanRT | Fork me at GitHub