Selecting a Distance Function

Lloyd's algorithm converges for the class of distance functions called Bregman Divergences. We provide a number of Bregman Divergences. When selecting a distance function, consider the domain of the input data. For example, frequency data is integral. Similarity of frequencies or distributions are best performed using the Kullback-Leibler divergence.

Name
Divergence

BregmanDivergence.EUCLIDEAN

Squared Euclidean

BregmanDivergence.RELATIVE_ENTROPY

BregmanDivergence.DISCRETE_KL

Kullback-Leibler

BregmanDivergence.DISCRETE_SMOOTHED_KL

Kullback-Leibler

BregmanDivergence.SPARSE_SMOOTHED_KL

Kullback-Leibler

BregmanDivergence.LOGISTIC_LOSS

Logistic Loss

BregmanDivergence.GENERALIZED_I

Generalized I

BregmanDivergence.ITAKURA_SAITO

You may construct instances of BregmanDivergence using the BregmanDivergencecompanion object.

package com.massivedatascience.divergence

object BregmanDivergence {
  def apply(name: String): BregmanDivergence = ???
}

From this, one may construct a distance function using the BregmanPointOps companion function.

From your BregmanDivergence, you may create an instance of the distance function by using the apply method of the BregmanPointOps companion object.

package com.massivedatascience.clusterer

object BregmanPointOps {
  def apply(d: BregmanDivergence): BregmanPointOps = ???
}

Last updated