- All Implemented Interfaces:
- Direct Known Subclasses:
public class InitializePlusPlus<P> extends Object implements InitializeKMeans<P>
Implementation of the seeding strategy described in . A point is randomly selected from the list as the first seed. The remaining seeds are selected randomly based on the distance of each seed from their closest cluster.
 David Arthur and Sergei Vassilvitskii. 2007. k-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (SODA '07). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1027-1035.
Constructors Constructor Description
Modifier and Type Method Description
PointDistance<P> distance, long randomSeed)(Initializes internal data structures.
()Creates a new instance which has the same configuration and can be run in parallel.
(double targetFraction)Randomly selects the next seed.
LArrayAccessor<P> points, int requestedSeeds, DogArray<P> selectedSeeds)(Given the a set of points, select a set of seeds to initialize k-means from.
LArrayAccessor<P> points, P seed)(A new seed has been added and the distance from the seeds needs to be updated
initializeInitializes internal data structures. Must be called first.
Given the a set of points, select a set of seeds to initialize k-means from.
- How duplicate points are handled isn't specified. It could result in two seeds having the same value or the number of selected seeds being less that the requested amount
- If the number of points is less than the number of seeds requested it will at most select one seed for each point
- Specified by:
points- (Input) Set of points which is to be clustered.
requestedSeeds- (Input) Number of seeds it will attempt to select. See above for exceptions.
selectedSeeds- (Output) Storage for selected seeds. They will be copied into it.
newInstanceThreadCreates a new instance which has the same configuration and can be run in parallel. Some components can be shared as long as they are read only and thread safe.
updateDistanceWithNewSeedA new seed has been added and the distance from the seeds needs to be updated
selectPointForNextSeedprotected int selectPointForNextSeed(double targetFraction)Randomly selects the next seed. The chance of a seed is based upon its distance from the closest cluster. Larger distances mean more likely.
targetFraction- Number from 0 to 1, inclusive
- Index of the selected seed. Return -1 is no valid seeds left