Class StandardKMeans<P>

All Implemented Interfaces:
Direct Known Subclasses:

public class StandardKMeans<P>
extends Object
implements ComputeClusters<P>

Standard implementation of k-means [1], summary is provided below:

  1. The initial seeds for each cluster is selected by the provided InitializeKMeans.
  2. Each point is assigned to a cluster which minimizes the euclidean distance squared.
  3. New cluster centers are computed from the average of all points assigned to it.

This will find a locally optimal solution which minimizes the sum of the distance-squared of each point to the cluster they are assigned to.

Converged if, (D[i] - D[i-1])/D[i] <= tol, where D is the sum of point from cluster distance at iteration 'i', and tol is the convergence tolerance threshold.

[1] Lloyd, S. P. (1957). "Least square quantization in PCM". Bell Telephone Laboratories Paper. Published in journal much later: Lloyd., S. P. (1982)

  • Field Details

    • maxIterations

      public int maxIterations
      maximum number of iterations
    • reseedAfterIterations

      public int reseedAfterIterations
      max iterations before it will reseed
    • maxReSeed

      public int maxReSeed
      max number of times it will re-seed
    • convergeTol

      public double convergeTol
      It is considered to be converged when the change in sum score is <= than this amount.
    • updateMeans

      public ComputeMeanClusters<P> updateMeans
    • seedSelector

      public InitializeKMeans<P> seedSelector
  • Constructor Details

  • Method Details

    • initialize

      public void initialize​(long randomSeed)
      Description copied from interface: ComputeClusters
      Must be called first to initializes internal data structures. Only needs to be called once.
      Specified by:
      initialize in interface ComputeClusters<P>
      randomSeed - Seed for any random number generators used internally.
    • process

      public void process​(LArrayAccessor<P> points, int numCluster)
      Description copied from interface: ComputeClusters
      Computes a set of clusters which segment the points into numCluster sets.
      Specified by:
      process in interface ComputeClusters<P>
      points - Set of points which are to be clustered. Not modified.
      numCluster - Number of clusters it will use to split the points.
    • getAssignment

      public AssignCluster<P> getAssignment()
      Description copied from interface: ComputeClusters

      Returns a class which is used to assign a point to a cluster. Only invoked after ComputeClusters.process(org.ddogleg.struct.LArrayAccessor<P>, int) has been called.

      WARNING: The returned data structure is recycled each time compute clusters is called. Create a copy if you wish to avoid having it modified.

      Specified by:
      getAssignment in interface ComputeClusters<P>
      Instance of AssignCluster.
    • matchPointsToClusters

      protected void matchPointsToClusters​(LArrayAccessor<P> points, DogArray<P> clusters)
      Finds the cluster which is the closest to each point. The point is the added to the sum for the cluster and its member count incremented
    • findBestMatch

      protected int findBestMatch​(P p, DogArray<P> clusters)
      Searches for this cluster which is the closest to p
    • getDistanceMeasure

      public double getDistanceMeasure()
      Computes the potential function. The sum of distance for each point from their cluster centers.\
      Specified by:
      getDistanceMeasure in interface ComputeClusters<P>
      sum of distance between each point and their respective clusters.
    • setVerbose

      public void setVerbose​(boolean verbose)
      Description copied from interface: ComputeClusters
      If set to true then information about status will be printed to standard out. By default verbose is off
      Specified by:
      setVerbose in interface ComputeClusters<P>
      verbose - true for versbose mode. False for quite mode.
    • newInstanceThread

      public ComputeClusters<P> newInstanceThread()
      Description copied from interface: ComputeClusters
      Creates a new instance which has the same configuration and can be run in parallel. Some components can be shared as long as they are read only and thread safe.
      Specified by:
      newInstanceThread in interface ComputeClusters<P>