K-means with equal cluster size

Hi, currently I’m trying to do k-means analysis to my data points in coordinate, in which when I executed the node gave me visualization like this.

I’m trying to make the cluster equal in size for all with total of 5 clusters. Is there something that I did wrong?

Thanks in Advance

Hello @andersenyunan -

The K-means node is not going to give you equal size clusters as output. Is there a particular reason you need clusters of equal size?

It turns out that generating such clusters is not trivial. You can see https://elki-project.github.io/tutorial/same-size_k_means for one method, if you don’t mind doing quite a lot of coding.

Apart from the equal size issue, are you seeing something else unexpected in your output? Briefly glancing at the image you posted, I don’t see anything odd. But feel free to post your workflow with additional information about what you’re trying to do and we can look at it further.

2 Likes

Hi @ScottF

Thanks for the solution. I’m sorry that it took me a while to response as I’m new to learning KNIME as well.
Actually, I realize that in image I shared earlier I only took distance variables to make the cluster. In addition to feedback that I got from my colleague is that I need to input time variables for each point, which take around 5-10 minutes. This would require to set constraint of each cluster to, say total 8 hours max.
However, I have not found any solutions to this. One method that I think might work would be multiple k-means variable, but I think the equation would just ignore the time variable if I put 10 minutes homogeneous in each points. Any suggestion for this?

Thanks in advance!

Topic to follow regarding @andersenyunan question: How to setup multiple k-means variable for distance & time
Br,
Ivan

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Check out these new k-means components that address this issue:

1 Like