Chemical clustering with similarity to chemical structure

Hey @reinyah94,

Welcome to the community!

How are you pre-processing the compounds? Do you clean them up or normalize them before running your clustering technique on them?

I would suggest to try the ‘RDKit Structure Normalizer’ and possibly the ‘Salt Stripper’ to see if you get any difference in results.

Or, if you would like there are some good suggestions from a thread:

For a very high level visualization, you can try and use t-SNE to visualize this. This is very high level and can give you a number in mind especially when you use k-means as you need to input a specific amount of clusters.

Also maybe this workflow may help you:

Hope this helps,
TL

1 Like