Embedding Data

Often raw data must be embedded in a different space before clustering. We provide several common embeddings. You may also create your own.

Name

Algorithm

Embedding.IDENTITY_EMBEDDING

Identity

Embedding.HAAR_EMBEDDING

Embedding.LOW_DIMENSIONAL_RI

with dimension 64 and epsilon = 0.1

Embedding.MEDIUM_DIMENSIONAL_RI

Random Indexing with dimension 256 and epsilon = 0.1

Embedding.HIGH_DIMENSIONAL_RI

Random Indexing with dimension 1024 and epsilon = 0.1

Embedding.SYMMETRIZING_KL_EMBEDDING

You may create an embedding using the apply method of the companion object.

package com.massivedatascience.transforms

object Embedding {
   def apply(embeddingName: String): Embedding = ???
}

Last updated 1 year ago