Package org.ddogleg.clustering.gmm
Class ExpectationMaximizationGmm_F64
java.lang.Object
org.ddogleg.clustering.gmm.ExpectationMaximizationGmm_F64
- All Implemented Interfaces:
ComputeClusters<double[]>
Standard expectation maximization based approach to fitting mixture-of-Gaussian models to a set of data.
A locally optimal maximum likelihood estimate is found. The full covariance is found. Some other
variants will estimate just diagonal elements or a single covariance, but that isn't yet supported.
Converged if, (D[i] - D[i-1])/D[i] <= tol
, where D is the sum of point from cluster distance at iteration 'i',
and tol is the convergence tolerance threshold.
-
Nested Class Summary
-
Constructor Summary
ConstructorDescriptionExpectationMaximizationGmm_F64
(int maxIterations, double convergeTol, int pointDimension, InitializeGmm_F64 selectInitial) Configures EM parameters -
Method Summary
Modifier and TypeMethodDescriptionprotected double
For each point compute the "responsibility" for each GaussianAssignCluster<double[]>
Returns a class which is used to assign a point to a cluster.double
Returns the sum of all the distances between each point in the set.void
initialize
(long randomSeed) Must be called first to initializes internal data structures.protected void
Using points responsibility information to recompute the Gaussians and their weights, maximizing the likelihood of the mixture.ComputeClusters<double[]>
Creates a new instance which has the same configuration and can be run in parallel.void
process
(LArrayAccessor<double[]> points, int numCluster) Computes a set of clusters which segment the points into numCluster sets.void
setVerbose
(boolean verbose) If set to true then information about status will be printed to standard out.
-
Constructor Details
-
ExpectationMaximizationGmm_F64
public ExpectationMaximizationGmm_F64(int maxIterations, double convergeTol, int pointDimension, InitializeGmm_F64 selectInitial) Configures EM parameters- Parameters:
maxIterations
- Maximum number of iterationsconvergeTol
- If the relative change in score is less or equal than this amount it has convergedselectInitial
- Used to select initial seeds for the clusters
-
-
Method Details
-
initialize
public void initialize(long randomSeed) Description copied from interface:ComputeClusters
Must be called first to initializes internal data structures. Only needs to be called once.- Specified by:
initialize
in interfaceComputeClusters<double[]>
- Parameters:
randomSeed
- Seed for any random number generators used internally.
-
process
Description copied from interface:ComputeClusters
Computes a set of clusters which segment the points into numCluster sets. The number of clusters and points must be 1 or more. If this is not true then the behavior is undefined.- Specified by:
process
in interfaceComputeClusters<double[]>
- Parameters:
points
- Set of points which are to be clustered. Not modified.numCluster
- Number of clusters it will use to split the points.
-
expectation
protected double expectation()For each point compute the "responsibility" for each Gaussian- Returns:
- The sum of chi-square. Can be used to estimate the total error.
-
maximization
protected void maximization()Using points responsibility information to recompute the Gaussians and their weights, maximizing the likelihood of the mixture. -
getAssignment
Description copied from interface:ComputeClusters
Returns a class which is used to assign a point to a cluster. Only invoked after
ComputeClusters.process(org.ddogleg.struct.LArrayAccessor<P>, int)
has been called.WARNING: The returned data structure is recycled each time compute clusters is called. Create a copy if you wish to avoid having it modified.
- Specified by:
getAssignment
in interfaceComputeClusters<double[]>
- Returns:
- Instance of
AssignCluster
.
-
getDistanceMeasure
public double getDistanceMeasure()Description copied from interface:ComputeClusters
Returns the sum of all the distances between each point in the set. Can be used to evaluate the quality of fit for all the clusters. Can only be used to compare when the same number of clusters is uesd.
NOTE: The specific distance measure is not specified and is application specific.- Specified by:
getDistanceMeasure
in interfaceComputeClusters<double[]>
- Returns:
- sum of distance between each point and their respective clusters.
-
setVerbose
public void setVerbose(boolean verbose) Description copied from interface:ComputeClusters
If set to true then information about status will be printed to standard out. By default verbose is off- Specified by:
setVerbose
in interfaceComputeClusters<double[]>
- Parameters:
verbose
- true for versbose mode. False for quite mode.
-
newInstanceThread
Description copied from interface:ComputeClusters
Creates a new instance which has the same configuration and can be run in parallel. Some components can be shared as long as they are read only and thread safe.- Specified by:
newInstanceThread
in interfaceComputeClusters<double[]>
-