Package org.ddogleg.clustering.kmeans
Class InitializePlusPlus<P>
java.lang.Object
org.ddogleg.clustering.kmeans.InitializePlusPlus<P>
- All Implemented Interfaces:
InitializeKMeans<P>
- Direct Known Subclasses:
InitializePlusPlus_MT
Implementation of the seeding strategy described in [1]. A point is randomly selected from the list as the first seed. The remaining seeds are selected randomly based on the distance of each seed from their closest cluster.
[1] David Arthur and Sergei Vassilvitskii. 2007. k-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (SODA '07). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1027-1035.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
initialize
(PointDistance<P> distance, long randomSeed) Initializes internal data structures.Creates a new instance which has the same configuration and can be run in parallel.protected int
selectPointForNextSeed
(double targetFraction) Randomly selects the next seed.void
selectSeeds
(LArrayAccessor<P> points, int requestedSeeds, DogArray<P> selectedSeeds) Given the a set of points, select a set of seeds to initialize k-means from.protected void
updateDistanceWithNewSeed
(LArrayAccessor<P> points, P seed) A new seed has been added and the distance from the seeds needs to be updated
-
Constructor Details
-
InitializePlusPlus
public InitializePlusPlus()
-
-
Method Details
-
initialize
Description copied from interface:InitializeKMeans
Initializes internal data structures. Must be called first.- Specified by:
initialize
in interfaceInitializeKMeans<P>
- Parameters:
distance
- Distance function between two pointsrandomSeed
- Seed for any random number generators used internally.
-
selectSeeds
Description copied from interface:InitializeKMeans
Given the a set of points, select a set of seeds to initialize k-means from.
- How duplicate points are handled isn't specified. It could result in two seeds having the same value or the number of selected seeds being less that the requested amount
- If the number of points is less than the number of seeds requested it will at most select one seed for each point
- Specified by:
selectSeeds
in interfaceInitializeKMeans<P>
- Parameters:
points
- (Input) Set of points which is to be clustered.requestedSeeds
- (Input) Number of seeds it will attempt to select. See above for exceptions.selectedSeeds
- (Output) Storage for selected seeds. They will be copied into it.
-
newInstanceThread
Description copied from interface:InitializeKMeans
Creates a new instance which has the same configuration and can be run in parallel. Some components can be shared as long as they are read only and thread safe.- Specified by:
newInstanceThread
in interfaceInitializeKMeans<P>
-
updateDistanceWithNewSeed
A new seed has been added and the distance from the seeds needs to be updated -
selectPointForNextSeed
protected int selectPointForNextSeed(double targetFraction) Randomly selects the next seed. The chance of a seed is based upon its distance from the closest cluster. Larger distances mean more likely.- Parameters:
targetFraction
- Number from 0 to 1, inclusive- Returns:
- Index of the selected seed. Return -1 is no valid seeds left
-