Package org.ddogleg.clustering.kmeans
Class AssignKMeans<P>
java.lang.Object
org.ddogleg.clustering.kmeans.AssignKMeans<P>
- All Implemented Interfaces:
Serializable
,AssignCluster<P>
Implementation of
AssignCluster
for K-Means. Euclidean distance squared is
used to select the best fit clusters to a point. This distance metric works well for hard assignment but can
produce undesirable results for soft assignment, see JavaDoc.- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionint
Assigns the point to cluster which is the best fit.void
Soft assignment is done by summing the total distance of the point from each cluster.int
Total number of clusters.
-
Constructor Details
-
AssignKMeans
-
-
Method Details
-
assign
Description copied from interface:AssignCluster
Assigns the point to cluster which is the best fit.- Specified by:
assign
in interfaceAssignCluster<P>
- Parameters:
point
- Point which is to be assigned- Returns:
- Index of the cluster from 0 to N-1
-
assign
Soft assignment is done by summing the total distance of the point from each cluster. Then for each cluster its value is set to total minus its distance. The output array is then normalized by dividing each element by the sum.
When all clusters are approximately the same distance or one is clearly the closest this produces reasonable results. When multiple clusters are much closer than at least on other cluster then it effectively ignores the relative difference in distances between the closest points. There are several obvious heuristic "fixes" to this issue, but the best way to solve it is to simply use
AssignGmm_F64
instead.- Specified by:
assign
in interfaceAssignCluster<P>
- Parameters:
point
- Point which is to be assignedfit
- Storage for relative fit quality of each cluster. Length must be at least the number of clusters.
-
getNumberOfClusters
public int getNumberOfClusters()Description copied from interface:AssignCluster
Total number of clusters.- Specified by:
getNumberOfClusters
in interfaceAssignCluster<P>
- Returns:
- The total number of clusters.
-