K-means minimum cluster size

Hi, are there a method that i can force for each created cluster in k-means method a minimum size?

Hi again @m1k

If you don’t mind having a lower K number of means in the end, a trick to easily achieve this -constrained minimum size K-means- is the following:

  1. Build your K-Means as usual.
  2. Supress the K-Means centres with size (number of samples) less than your minimum threshold. This generates “orphan” samples.
  3. Recluster (redistribute) the “orphan” samples into the remaining nearest K-Means.

However, if you still need to end up with your exact initial K number of Means, you could split the bigger clusters into smaller ones to reach the desired number K of final clusters (but this second operation is definitely a bit more involved to get it right although possible).

Hope this helps.

Best,

Ael

1 Like

There is a component for same size K-Means: Same-size k-Means – KNIME Hub

And, if you want to integrate Python into KNIME, there are options for minimum size k-means packages available. E.g.: k-means-constrained · PyPI

4 Likes

Hi @Snowy

This is a very interesting component and thanks for bringing it to our attention. However, it is solving a different problem w.r.t. the question asked by @m1k.

@m1k’s question is about only generating clusters of at least a given minimum size, whatever the size of the clusters, whereas the component you mention provides a “Same-size k-Means” solution.

May be the component “Same-size k-Means” is also a solution for @m1k ?

Best

Ael

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.