Last updated
Last updated
Most practical variants of K-means clustering are implemented or can be implemented with this package.
- observes that Lloyd's algorithms converges for distance functions defined by Bregman Divergences
- uses a 2-step iterative algorithm to cluster a subset of the data and then the full set
- uses random indexing to lower the dimension of high dimensional data
- uses the Haar Transform to embed time series data before clustering
- shows metrics can can make use of the triangle inequality to speed up clustering
- a recursive subdivision algorithm
- a provably good initial set of cluster centers
- a mini-batch algorithm suitable for online data sets
If you find a novel variant of k-means clustering that is provably superior in some manner, implement it using the package and send a pull request along with the paper analyzing the variant! Here are some newer algorithms that are worth investigating:
- even better seeding
- a recursive subdivision algorithm
- a novel inversion of the k-means algorithm with dramatic speedups on large data sets