Class ExpectationMaximizationGmm_F64

java.lang.Object
org.ddogleg.clustering.gmm.ExpectationMaximizationGmm_F64
All Implemented Interfaces:
ComputeClusters<double[]>

public class ExpectationMaximizationGmm_F64 extends Object implements ComputeClusters<double[]>
Standard expectation maximization based approach to fitting mixture-of-Gaussian models to a set of data. A locally optimal maximum likelihood estimate is found. The full covariance is found. Some other variants will estimate just diagonal elements or a single covariance, but that isn't yet supported.

Converged if, (D[i] - D[i-1])/D[i] <= tol, where D is the sum of point from cluster distance at iteration 'i', and tol is the convergence tolerance threshold.

  • Constructor Details

    • ExpectationMaximizationGmm_F64

      public ExpectationMaximizationGmm_F64(int maxIterations, double convergeTol, int pointDimension, InitializeGmm_F64 selectInitial)
      Configures EM parameters
      Parameters:
      maxIterations - Maximum number of iterations
      convergeTol - If the relative change in score is less or equal than this amount it has converged
      selectInitial - Used to select initial seeds for the clusters
  • Method Details

    • initialize

      public void initialize(long randomSeed)
      Description copied from interface: ComputeClusters
      Must be called first to initializes internal data structures. Only needs to be called once.
      Specified by:
      initialize in interface ComputeClusters<double[]>
      Parameters:
      randomSeed - Seed for any random number generators used internally.
    • process

      public void process(LArrayAccessor<double[]> points, int numCluster)
      Description copied from interface: ComputeClusters
      Computes a set of clusters which segment the points into numCluster sets. The number of clusters and points must be 1 or more. If this is not true then the behavior is undefined.
      Specified by:
      process in interface ComputeClusters<double[]>
      Parameters:
      points - Set of points which are to be clustered. Not modified.
      numCluster - Number of clusters it will use to split the points.
    • expectation

      protected double expectation()
      For each point compute the "responsibility" for each Gaussian
      Returns:
      The sum of chi-square. Can be used to estimate the total error.
    • maximization

      protected void maximization()
      Using points responsibility information to recompute the Gaussians and their weights, maximizing the likelihood of the mixture.
    • getAssignment

      public AssignCluster<double[]> getAssignment()
      Description copied from interface: ComputeClusters

      Returns a class which is used to assign a point to a cluster. Only invoked after ComputeClusters.process(org.ddogleg.struct.LArrayAccessor<P>, int) has been called.

      WARNING: The returned data structure is recycled each time compute clusters is called. Create a copy if you wish to avoid having it modified.

      Specified by:
      getAssignment in interface ComputeClusters<double[]>
      Returns:
      Instance of AssignCluster.
    • getDistanceMeasure

      public double getDistanceMeasure()
      Description copied from interface: ComputeClusters

      Returns the sum of all the distances between each point in the set. Can be used to evaluate the quality of fit for all the clusters. Can only be used to compare when the same number of clusters is uesd.

      NOTE: The specific distance measure is not specified and is application specific.
      Specified by:
      getDistanceMeasure in interface ComputeClusters<double[]>
      Returns:
      sum of distance between each point and their respective clusters.
    • setVerbose

      public void setVerbose(boolean verbose)
      Description copied from interface: ComputeClusters
      If set to true then information about status will be printed to standard out. By default verbose is off
      Specified by:
      setVerbose in interface ComputeClusters<double[]>
      Parameters:
      verbose - true for versbose mode. False for quite mode.
    • newInstanceThread

      public ComputeClusters<double[]> newInstanceThread()
      Description copied from interface: ComputeClusters
      Creates a new instance which has the same configuration and can be run in parallel. Some components can be shared as long as they are read only and thread safe.
      Specified by:
      newInstanceThread in interface ComputeClusters<double[]>