Class InitializePlusPlus<P>

java.lang.Object
org.ddogleg.clustering.kmeans.InitializePlusPlus<P>
All Implemented Interfaces:
InitializeKMeans<P>
Direct Known Subclasses:
InitializePlusPlus_MT

public class InitializePlusPlus<P> extends Object implements InitializeKMeans<P>

Implementation of the seeding strategy described in [1]. A point is randomly selected from the list as the first seed. The remaining seeds are selected randomly based on the distance of each seed from their closest cluster.

[1] David Arthur and Sergei Vassilvitskii. 2007. k-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (SODA '07). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1027-1035.

  • Constructor Details

    • InitializePlusPlus

      public InitializePlusPlus()
  • Method Details

    • initialize

      public void initialize(PointDistance<P> distance, long randomSeed)
      Description copied from interface: InitializeKMeans
      Initializes internal data structures. Must be called first.
      Specified by:
      initialize in interface InitializeKMeans<P>
      Parameters:
      distance - Distance function between two points
      randomSeed - Seed for any random number generators used internally.
    • selectSeeds

      public void selectSeeds(LArrayAccessor<P> points, int requestedSeeds, DogArray<P> selectedSeeds)
      Description copied from interface: InitializeKMeans

      Given the a set of points, select a set of seeds to initialize k-means from.

      • How duplicate points are handled isn't specified. It could result in two seeds having the same value or the number of selected seeds being less that the requested amount
      • If the number of points is less than the number of seeds requested it will at most select one seed for each point
      *
      Specified by:
      selectSeeds in interface InitializeKMeans<P>
      Parameters:
      points - (Input) Set of points which is to be clustered.
      requestedSeeds - (Input) Number of seeds it will attempt to select. See above for exceptions.
      selectedSeeds - (Output) Storage for selected seeds. They will be copied into it.
    • newInstanceThread

      public InitializeKMeans<P> newInstanceThread()
      Description copied from interface: InitializeKMeans
      Creates a new instance which has the same configuration and can be run in parallel. Some components can be shared as long as they are read only and thread safe.
      Specified by:
      newInstanceThread in interface InitializeKMeans<P>
    • updateDistanceWithNewSeed

      protected void updateDistanceWithNewSeed(LArrayAccessor<P> points, P seed)
      A new seed has been added and the distance from the seeds needs to be updated
    • selectPointForNextSeed

      protected int selectPointForNextSeed(double targetFraction)
      Randomly selects the next seed. The chance of a seed is based upon its distance from the closest cluster. Larger distances mean more likely.
      Parameters:
      targetFraction - Number from 0 to 1, inclusive
      Returns:
      Index of the selected seed. Return -1 is no valid seeds left