Class StandardKMeans<P>
- All Implemented Interfaces:
ComputeClusters<P>
- Direct Known Subclasses:
StandardKMeans_MT
Standard implementation of k-means [1], summary is provided below:
- The initial seeds for each cluster is selected by the provided
InitializeKMeans
. - Each point is assigned to a cluster which minimizes the euclidean distance squared.
- New cluster centers are computed from the average of all points assigned to it.
This will find a locally optimal solution which minimizes the sum of the distance-squared of each point to the cluster they are assigned to.
Converged if, (D[i] - D[i-1])/D[i] <= tol
, where D is the sum of point from cluster distance at iteration 'i',
and tol is the convergence tolerance threshold.
[1] Lloyd, S. P. (1957). "Least square quantization in PCM". Bell Telephone Laboratories Paper. Published in journal much later: Lloyd., S. P. (1982)
-
Field Summary
Modifier and TypeFieldDescriptiondouble
It is considered to be converged when the change in sum score is <= than this amount.int
maximum number of iterationsint
max number of times it will re-seedint
max iterations before it will reseed -
Constructor Summary
ConstructorDescriptionStandardKMeans
(ComputeMeanClusters<P> updateMeans, InitializeKMeans<P> seedSelector, PointDistance<P> distancer, DogLambdas.NewInstance<P> factory) Configures k-means parameters -
Method Summary
Modifier and TypeMethodDescriptionprotected int
findBestMatch
(P p, DogArray<P> clusters) Searches for this cluster which is the closest to pReturns a class which is used to assign a point to a cluster.double
Computes the potential function.void
initialize
(long randomSeed) Must be called first to initializes internal data structures.protected void
matchPointsToClusters
(LArrayAccessor<P> points, DogArray<P> clusters) Finds the cluster which is the closest to each point.Creates a new instance which has the same configuration and can be run in parallel.void
process
(LArrayAccessor<P> points, int numCluster) Computes a set of clusters which segment the points into numCluster sets.void
setVerbose
(boolean verbose) If set to true then information about status will be printed to standard out.
-
Field Details
-
maxIterations
public int maxIterationsmaximum number of iterations -
reseedAfterIterations
public int reseedAfterIterationsmax iterations before it will reseed -
maxReSeed
public int maxReSeedmax number of times it will re-seed -
convergeTol
public double convergeTolIt is considered to be converged when the change in sum score is <= than this amount. -
updateMeans
-
seedSelector
-
-
Constructor Details
-
StandardKMeans
public StandardKMeans(ComputeMeanClusters<P> updateMeans, InitializeKMeans<P> seedSelector, PointDistance<P> distancer, DogLambdas.NewInstance<P> factory) Configures k-means parameters- Parameters:
seedSelector
- Used to select initial seeds for the clusters
-
-
Method Details
-
initialize
public void initialize(long randomSeed) Description copied from interface:ComputeClusters
Must be called first to initializes internal data structures. Only needs to be called once.- Specified by:
initialize
in interfaceComputeClusters<P>
- Parameters:
randomSeed
- Seed for any random number generators used internally.
-
process
Description copied from interface:ComputeClusters
Computes a set of clusters which segment the points into numCluster sets. The number of clusters and points must be 1 or more. If this is not true then the behavior is undefined.- Specified by:
process
in interfaceComputeClusters<P>
- Parameters:
points
- Set of points which are to be clustered. Not modified.numCluster
- Number of clusters it will use to split the points.
-
getAssignment
Description copied from interface:ComputeClusters
Returns a class which is used to assign a point to a cluster. Only invoked after
ComputeClusters.process(org.ddogleg.struct.LArrayAccessor<P>, int)
has been called.WARNING: The returned data structure is recycled each time compute clusters is called. Create a copy if you wish to avoid having it modified.
- Specified by:
getAssignment
in interfaceComputeClusters<P>
- Returns:
- Instance of
AssignCluster
.
-
matchPointsToClusters
Finds the cluster which is the closest to each point. The point is the added to the sum for the cluster and its member count incremented -
findBestMatch
Searches for this cluster which is the closest to p -
getDistanceMeasure
public double getDistanceMeasure()Computes the potential function. The sum of distance for each point from their cluster centers.\- Specified by:
getDistanceMeasure
in interfaceComputeClusters<P>
- Returns:
- sum of distance between each point and their respective clusters.
-
setVerbose
public void setVerbose(boolean verbose) Description copied from interface:ComputeClusters
If set to true then information about status will be printed to standard out. By default verbose is off- Specified by:
setVerbose
in interfaceComputeClusters<P>
- Parameters:
verbose
- true for versbose mode. False for quite mode.
-
newInstanceThread
Description copied from interface:ComputeClusters
Creates a new instance which has the same configuration and can be run in parallel. Some components can be shared as long as they are read only and thread safe.- Specified by:
newInstanceThread
in interfaceComputeClusters<P>
-