Package org.ddogleg.clustering.gmm
Class ExpectationMaximizationGmm_F64
java.lang.Object
org.ddogleg.clustering.gmm.ExpectationMaximizationGmm_F64
- All Implemented Interfaces:
ComputeClusters<double[]>
Standard expectation maximization based approach to fitting mixture-of-Gaussian models to a set of data.
A locally optimal maximum likelihood estimate is found. The full covariance is found. Some other
variants will estimate just diagonal elements or a single covariance, but that isn't yet supported.
Converged if, (D[i] - D[i-1])/D[i] <= tol, where D is the sum of point from cluster distance at iteration 'i',
and tol is the convergence tolerance threshold.
-
Nested Class Summary
Nested Classes -
Constructor Summary
ConstructorsConstructorDescriptionExpectationMaximizationGmm_F64(int maxIterations, double convergeTol, int pointDimension, InitializeGmm_F64 selectInitial) Configures EM parameters -
Method Summary
Modifier and TypeMethodDescriptionprotected doubleFor each point compute the "responsibility" for each GaussianAssignCluster<double[]>Returns a class which is used to assign a point to a cluster.doubleReturns the sum of all the distances between each point in the set.voidinitialize(long randomSeed) Must be called first to initializes internal data structures.protected voidUsing points responsibility information to recompute the Gaussians and their weights, maximizing the likelihood of the mixture.ComputeClusters<double[]>Creates a new instance which has the same configuration and can be run in parallel.voidprocess(LArrayAccessor<double[]> points, int numCluster) Computes a set of clusters which segment the points into numCluster sets.voidsetVerbose(boolean verbose) If set to true then information about status will be printed to standard out.
-
Constructor Details
-
ExpectationMaximizationGmm_F64
public ExpectationMaximizationGmm_F64(int maxIterations, double convergeTol, int pointDimension, InitializeGmm_F64 selectInitial) Configures EM parameters- Parameters:
maxIterations- Maximum number of iterationsconvergeTol- If the relative change in score is less or equal than this amount it has convergedselectInitial- Used to select initial seeds for the clusters
-
-
Method Details
-
initialize
public void initialize(long randomSeed) Description copied from interface:ComputeClustersMust be called first to initializes internal data structures. Only needs to be called once.- Specified by:
initializein interfaceComputeClusters<double[]>- Parameters:
randomSeed- Seed for any random number generators used internally.
-
process
Description copied from interface:ComputeClustersComputes a set of clusters which segment the points into numCluster sets. The number of clusters and points must be 1 or more. If this is not true then the behavior is undefined.- Specified by:
processin interfaceComputeClusters<double[]>- Parameters:
points- Set of points which are to be clustered. Not modified.numCluster- Number of clusters it will use to split the points.
-
expectation
protected double expectation()For each point compute the "responsibility" for each Gaussian- Returns:
- The sum of chi-square. Can be used to estimate the total error.
-
maximization
protected void maximization()Using points responsibility information to recompute the Gaussians and their weights, maximizing the likelihood of the mixture. -
getAssignment
Description copied from interface:ComputeClustersReturns a class which is used to assign a point to a cluster. Only invoked after
ComputeClusters.process(org.ddogleg.struct.LArrayAccessor<P>, int)has been called.WARNING: The returned data structure is recycled each time compute clusters is called. Create a copy if you wish to avoid having it modified.
- Specified by:
getAssignmentin interfaceComputeClusters<double[]>- Returns:
- Instance of
AssignCluster.
-
getDistanceMeasure
public double getDistanceMeasure()Description copied from interface:ComputeClustersReturns the sum of all the distances between each point in the set. Can be used to evaluate the quality of fit for all the clusters. Can only be used to compare when the same number of clusters is uesd.
NOTE: The specific distance measure is not specified and is application specific.- Specified by:
getDistanceMeasurein interfaceComputeClusters<double[]>- Returns:
- sum of distance between each point and their respective clusters.
-
setVerbose
public void setVerbose(boolean verbose) Description copied from interface:ComputeClustersIf set to true then information about status will be printed to standard out. By default verbose is off- Specified by:
setVerbosein interfaceComputeClusters<double[]>- Parameters:
verbose- true for versbose mode. False for quite mode.
-
newInstanceThread
Description copied from interface:ComputeClustersCreates a new instance which has the same configuration and can be run in parallel. Some components can be shared as long as they are read only and thread safe.- Specified by:
newInstanceThreadin interfaceComputeClusters<double[]>
-