Class AssignKMeans<P>

java.lang.Object
org.ddogleg.clustering.kmeans.AssignKMeans<P>
All Implemented Interfaces:
Serializable, AssignCluster<P>

public class AssignKMeans<P> extends Object implements AssignCluster<P>, Serializable
Implementation of AssignCluster for K-Means. Euclidean distance squared is used to select the best fit clusters to a point. This distance metric works well for hard assignment but can produce undesirable results for soft assignment, see JavaDoc.
See Also:
  • Constructor Details

  • Method Details

    • assign

      public int assign(P point)
      Description copied from interface: AssignCluster
      Assigns the point to cluster which is the best fit.
      Specified by:
      assign in interface AssignCluster<P>
      Parameters:
      point - Point which is to be assigned
      Returns:
      Index of the cluster from 0 to N-1
    • assign

      public void assign(P point, double[] fit)

      Soft assignment is done by summing the total distance of the point from each cluster. Then for each cluster its value is set to total minus its distance. The output array is then normalized by dividing each element by the sum.

      When all clusters are approximately the same distance or one is clearly the closest this produces reasonable results. When multiple clusters are much closer than at least on other cluster then it effectively ignores the relative difference in distances between the closest points. There are several obvious heuristic "fixes" to this issue, but the best way to solve it is to simply use AssignGmm_F64 instead.

      Specified by:
      assign in interface AssignCluster<P>
      Parameters:
      point - Point which is to be assigned
      fit - Storage for relative fit quality of each cluster. Length must be at least the number of clusters.
    • getNumberOfClusters

      public int getNumberOfClusters()
      Description copied from interface: AssignCluster
      Total number of clusters.
      Specified by:
      getNumberOfClusters in interface AssignCluster<P>
      Returns:
      The total number of clusters.