Package org.ddogleg.clustering.kmeans
Class AssignKMeans<P>
java.lang.Object
org.ddogleg.clustering.kmeans.AssignKMeans<P>
- All Implemented Interfaces:
Serializable,AssignCluster<P>
Implementation of
AssignCluster for K-Means. Euclidean distance squared is
used to select the best fit clusters to a point. This distance metric works well for hard assignment but can
produce undesirable results for soft assignment, see JavaDoc.- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionintAssigns the point to cluster which is the best fit.voidSoft assignment is done by summing the total distance of the point from each cluster.intTotal number of clusters.
-
Constructor Details
-
AssignKMeans
-
-
Method Details
-
assign
Description copied from interface:AssignClusterAssigns the point to cluster which is the best fit.- Specified by:
assignin interfaceAssignCluster<P>- Parameters:
point- Point which is to be assigned- Returns:
- Index of the cluster from 0 to N-1
-
assign
Soft assignment is done by summing the total distance of the point from each cluster. Then for each cluster its value is set to total minus its distance. The output array is then normalized by dividing each element by the sum.
When all clusters are approximately the same distance or one is clearly the closest this produces reasonable results. When multiple clusters are much closer than at least on other cluster then it effectively ignores the relative difference in distances between the closest points. There are several obvious heuristic "fixes" to this issue, but the best way to solve it is to simply use
AssignGmm_F64instead.- Specified by:
assignin interfaceAssignCluster<P>- Parameters:
point- Point which is to be assignedfit- Storage for relative fit quality of each cluster. Length must be at least the number of clusters.
-
getNumberOfClusters
public int getNumberOfClusters()Description copied from interface:AssignClusterTotal number of clusters.- Specified by:
getNumberOfClustersin interfaceAssignCluster<P>- Returns:
- The total number of clusters.
-