Relation to Spark K-Means Clusterer

This project generalizes the Spark MLLIB Batch K-Means (v1.1.0) clusterer and the Spark MLLIB Streaming K-Means (v1.2.0) clusterer.

This code has been tested on data sets of tens of millions of points in a 700+ dimensional space using a variety of distance functions. Thanks to the excellent core Spark implementation, it rocks!

Last updated