Package org.ddogleg.clustering.gmm
Class ExpectationMaximizationGmm_F64
java.lang.Object
org.ddogleg.clustering.gmm.ExpectationMaximizationGmm_F64
 All Implemented Interfaces:
ComputeClusters<double[]>
public class ExpectationMaximizationGmm_F64 extends Object implements ComputeClusters<double[]>
Standard expectation maximization based approach to fitting mixtureofGaussian models to a set of data.
A locally optimal maximum likelihood estimate is found. The full covariance is found. Some other
variants will estimate just diagonal elements or a single covariance, but that isn't yet supported.
Converged if, (D[i]  D[i1])/D[i] <= tol
, where D is the sum of point from cluster distance at iteration 'i',
and tol is the convergence tolerance threshold.

Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ExpectationMaximizationGmm_F64.PointInfo

Constructor Summary
Constructors Constructor Description ExpectationMaximizationGmm_F64(int maxIterations, double convergeTol, int pointDimension, InitializeGmm_F64 selectInitial)
Configures EM parameters 
Method Summary
Modifier and Type Method Description protected double
expectation()
For each point compute the "responsibility" for each GaussianAssignCluster<double[]>
getAssignment()
Returns a class which is used to assign a point to a cluster.double
getDistanceMeasure()
Returns the sum of all the distances between each point in the set.void
initialize(long randomSeed)
Must be called first to initializes internal data structures.protected void
maximization()
Using points responsibility information to recompute the Gaussians and their weights, maximizing the likelihood of the mixture.ComputeClusters<double[]>
newInstanceThread()
Creates a new instance which has the same configuration and can be run in parallel.void
process(LArrayAccessor<double[]> points, int numCluster)
Computes a set of clusters which segment the points into numCluster sets.void
setVerbose(boolean verbose)
If set to true then information about status will be printed to standard out.

Constructor Details

ExpectationMaximizationGmm_F64
public ExpectationMaximizationGmm_F64(int maxIterations, double convergeTol, int pointDimension, InitializeGmm_F64 selectInitial)Configures EM parameters Parameters:
maxIterations
 Maximum number of iterationsconvergeTol
 If the relative change in score is less or equal than this amount it has convergedselectInitial
 Used to select initial seeds for the clusters


Method Details

initialize
public void initialize(long randomSeed)Description copied from interface:ComputeClusters
Must be called first to initializes internal data structures. Only needs to be called once. Specified by:
initialize
in interfaceComputeClusters<double[]>
 Parameters:
randomSeed
 Seed for any random number generators used internally.

process
Description copied from interface:ComputeClusters
Computes a set of clusters which segment the points into numCluster sets. Specified by:
process
in interfaceComputeClusters<double[]>
 Parameters:
points
 Set of points which are to be clustered. Not modified.numCluster
 Number of clusters it will use to split the points.

expectation
protected double expectation()For each point compute the "responsibility" for each Gaussian Returns:
 The sum of chisquare. Can be used to estimate the total error.

maximization
protected void maximization()Using points responsibility information to recompute the Gaussians and their weights, maximizing the likelihood of the mixture. 
getAssignment
Description copied from interface:ComputeClusters
Returns a class which is used to assign a point to a cluster. Only invoked after
ComputeClusters.process(org.ddogleg.struct.LArrayAccessor<P>, int)
has been called.WARNING: The returned data structure is recycled each time compute clusters is called. Create a copy if you wish to avoid having it modified.
 Specified by:
getAssignment
in interfaceComputeClusters<double[]>
 Returns:
 Instance of
AssignCluster
.

getDistanceMeasure
public double getDistanceMeasure()Description copied from interface:ComputeClusters
Returns the sum of all the distances between each point in the set. Can be used to evaluate the quality of fit for all the clusters. Can only be used to compare when the same number of clusters is uesd.
NOTE: The specific distance measure is not specified and is application specific. Specified by:
getDistanceMeasure
in interfaceComputeClusters<double[]>
 Returns:
 sum of distance between each point and their respective clusters.

setVerbose
public void setVerbose(boolean verbose)Description copied from interface:ComputeClusters
If set to true then information about status will be printed to standard out. By default verbose is off Specified by:
setVerbose
in interfaceComputeClusters<double[]>
 Parameters:
verbose
 true for versbose mode. False for quite mode.

newInstanceThread
Description copied from interface:ComputeClusters
Creates a new instance which has the same configuration and can be run in parallel. Some components can be shared as long as they are read only and thread safe. Specified by:
newInstanceThread
in interfaceComputeClusters<double[]>
