Clustering with restrictions

This is an open question about available clustering methods in KNIME. I could not find a clustering node or combination of nodes for the situation I describe here:

1- First problem (distance between cluster points): suppose that you have a number of points P with attributes (x, y), and you want to cluster based on (x, y) but also impose a restriction that distance between each P1 and P2 of the same cluster should not exceed a threshold. How do you do this?

2- Second problem (cluster weigth): if each point would have a weight attribute, I would like to insure that the total weight of the cluster does not exceed a given threshold.

Thanks in advance.

Hi @peleitor -

For the first question, I believe you can use our nodes for distance matrix calculation and hierarchical clustering. In particular, you can use the Hierarchical Clustering Assigner node to choose clusters based on a distance threshold. A toy workflow might look like this:

2018-08-24%2015_13_57-KNIME%20Analytics%20Platform

For your second question: is there a particular algorithm you are looking to implement? I’m not that familiar with the approach, but that doesn’t mean that KNIME can’t do it - it’s just ignorance on my part. Or perhaps I’m just not understanding the question, in which case others might have some useful input :slight_smile:

Thanks for your answer, I will give it a try.

Second question is about imposing restrictions to each cluster, eg. total cluster weight should not exceed certain threshold.

Hi I am currently facing a similar problem to the one addressed by this topic. In particular I have used a performed a Hierarchical Clustering Assigner to cluster elements based on their reciprocal distance. I now want to impose some additional constraints such as maximum number of elements per cluster.

Any advice on how I can impose this? @peleitor were you able to solve your problem?

Thanks

Not exactly a solution, but maybe DBSCAN is an interesting clustering alternative, depending on your use case.

Thanks @peleitor I’ll study the DBSCAN and give it a try

Hi @peleitor @Altair78

Have you solved this problem? Could you share any workflow? I am currently facing the same situation where I want to impose certain threshold to each cluster in distance, except instead of distance I want it to be time. In this case, I am thinking about putting a limit for each point in my data of 10 minutes.

I think it could be solved with similar approach to what you’re facing earlier.

Thanks in advance.

Topic to follow regarding @andersenyunan question: How to setup multiple k-means variable for distance & time
Br,
Ivan