Group n points in k clusters of equal size

How I can create clustering procedure where each cluster has an equal number of points?

n is the number of point
k is the number of cluster
m = n / k (the ideal cluster size)

K-Means Cluster Function in Knime just handle "k", not Number of points... 

Hey,

Classic kmeans like it is implemented in the K-Means node focuses on minimizing the within-cluster-distances. It doesn't take cluster-sizes into account and I can't think of a clustering-algorithm that focuses on that.

As a trick you could modify the standard kmeans algorithm and set the maximum cluster-size to n/k. If the cluster of the nearest centroid of a datapoint is already the size of n/k, the point will be assigned to the second nearest cluster with size < n/k. You could implement this with a Java Snippet node.

Another possibilty would be to start with an initial kmeans clustering. Afterwards assign successively (in a loop) a datapoint from the biggest cluster to the next closest centroid of a cluster < n/k. Repeat until all clusters have the same size. After each iteration the cluster-centers have to be recalculated of course.

Keep in mind that the results of these approaches will not be as good as the kmeans results concerning the within-cluster-distances. Hope that helps.

Cheers,
Marten

Hello,

I’ve implemented a component for generating same-size K-means clusters. It’s available on the public hub:

This component identifies equally-sized clusters of homogeneous items in a dataset. K-means algorithm is used to derive centers of the clusters then every point to the closest cluster, forcing each cluster to be of equal size. The size of each cluster will approximately be the number of original rows divided by the number of clusters.

Hope it helps!

5 Likes

Hi Sam,

Thanks for spotting this. It might be due to a bug in the component. If it’s not sensitive data, would you be able to share your workflow with me through KNIME Hub? I’ll be happy to have a look!

Andrea

4 Likes

I found that when I tried in a fresh workflow, it did exactly what you say it will. I don’t know what might have caused the issue, but I thank you for asking and for creating this useful node.

4 Likes

I’m glad it worked, @sjames58!