Clustering with restrictions

This is an open question about available clustering methods in KNIME. I could not find a clustering node or combination of nodes for the situation I describe here:

1- First problem (distance between cluster points): suppose that you have a number of points P with attributes (x, y), and you want to cluster based on (x, y) but also impose a restriction that distance between each P1 and P2 of the same cluster should not exceed a threshold. How do you do this?

2- Second problem (cluster weigth): if each point would have a weight attribute, I would like to insure that the total weight of the cluster does not exceed a given threshold.

Hi @peleitor -

For the first question, I believe you can use our nodes for distance matrix calculation and hierarchical clustering. In particular, you can use the Hierarchical Clustering Assigner node to choose clusters based on a distance threshold. A toy workflow might look like this:

For your second question: is there a particular algorithm you are looking to implement? I’m not that familiar with the approach, but that doesn’t mean that KNIME can’t do it - it’s just ignorance on my part. Or perhaps I’m just not understanding the question, in which case others might have some useful input

Second question is about imposing restrictions to each cluster, eg. total cluster weight should not exceed certain threshold.

Hi I am currently facing a similar problem to the one addressed by this topic. In particular I have used a performed a Hierarchical Clustering Assigner to cluster elements based on their reciprocal distance. I now want to impose some additional constraints such as maximum number of elements per cluster.

Any advice on how I can impose this? @peleitor were you able to solve your problem?

Thanks

Not exactly a solution, but maybe DBSCAN is an interesting clustering alternative, depending on your use case.

Thanks @peleitor I’ll study the DBSCAN and give it a try

Have you solved this problem? Could you share any workflow? I am currently facing the same situation where I want to impose certain threshold to each cluster in distance, except instead of distance I want it to be time. In this case, I am thinking about putting a limit for each point in my data of 10 minutes.

I think it could be solved with similar approach to what you’re facing earlier.