Clustering chemical structures and selecting cluster centres


There are been a couple of posts related to this but unfortunately no solutions. 

I have a library of ~1000 compounds which I would like to filter down to about 12 compounds that represent as much chemical space as possible. I have been able to do this with an external tool where it performs a k-means clustering. I can then select the compound at the cluster centre.

I would like to so the same thing in KNIME. I have been able to perform the k-means clustering but I have not been able to select the cluster centre. Any ideas how I might do this and which nodes I would need?

I am fairly new to clustering methods so if possible please explain in laymans terms!

Many thanks


You could use the MoSS MCSS Molecular Distance node, for example, to compute a distance matrix and then perform a k-medoids (not k-means!) clustering. This will also give you the cluster centers. You may also use a distance matrix based on e.g. fingerprints using the Tanimoto distance.

Dear Alex,

This is exactly what the RDKit "Diversity Picker" node is intended to do. 



I was just about to advise the same but Greg beat me to it.