Clustering of docking poses


I have a file with docking poses and their respective docking scores as well as chemical fingerprints. Now I would like to cluster them as follows:

  1. identify the pose with the best (lowest) docking score, this will be the representative for cluster #1
  2. among the rest of the poses identify the ones with a fingerprint Tanimoto similarity >0.7 and assign these to cluster #1
  3. the next best scoring pose not assigned to cluster #1 opens cluster #2 and the remaining poses with with a fingerprint Tanimoto similarity >0.7 are assigned to cluster #2
  4. repeat until all poses have been assigned to a cluster

Attached is an example file with 100 poses, their docking scores and RDKit Morgan fingerprints.

Any efficient solutions appreciated!


Docking Pose Clustering.knwf (23.1 KB)

Well, it turns out I was very close to the solution myself already: adding a Column Filter node before the Recursive Loop End to remove the Similarity column did the trick. The cluster numbers correspond to the iterations (an option set in the Recursive Loop End node).

I have added the actual molecules for the chemists amongst you so that you can see the clustering makes sense. Hopefully this is useful for others.

Happy Easter/Evert

Docking Pose Clustering v2.knwf (24.8 KB)

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.