Hierarchical Clustering (DistMatrix) Error bug

Dear all,

as you can see in the workflow attached if I apply a "Hierarchical Clustering (DistMatrix)" after a "String distance" I get the following error:

Execute failed: At most 65,500 patterns can be clustered

Is this a bug? How can I solve this problem?

Thanks in advance


This is a deliberate limitation that allows for an optimization in the implementation. However, since the complexity of hierarchical clustering is cubic in the number of patterns, clustering so many patterns will take forever anyway.

Thank you very much for the answer.

1) What about removing this limitation to allow to do hierarchical clustering with more than 65,500 patterns even if I have to wait 3 days with a powerful machine? 

2) Would you have any other suggestions on how to cluster this population based on a string distance without using a Hierarchical Clustering ?

Thanks in advance



You can use k-Medoids for example.

Thank you very much.

But the problem is that with k-Medoids you cannot assign a cluster based on distance threshold as you can do with "Hierarchical Cluster Assigner". So the number of clusters is predetermined and so in my case it does not help very much.