Similarity search

Mindsword · February 23, 2015, 10:41pm

Hello all,

I am currently working on a project where i am examining two sdf files full of chemicals and comparing their fingerprints. I am doing this for several files, so I wasn't too surprised to see there was no exact matches in this particular data set. It happens. However, my boss wanted me to look at how close the nearest similarites were.

The top 3 hits are all exactly 5.0 for similarity. One of hitslooks to be a perfect match based on the SMILEs and everything else we can see. I can only assume there is some odd bug that is capping the threshold at 5.0 for this specific test. However, other tests using the same files do not show this, nor can I find any obvious difference in the Similarity Search node.

Thoughts?

richards99 · February 24, 2015, 6:57am

That sounds odd. For the distance measure have you made sure Tanimoto is selected and the fingerprint column. Usually the Similarity range should be between 0 and 1, so I'm confused where the number 5 has come from unless you haven't selected Tanimoto.

simon.

Mindsword · March 4, 2015, 9:21pm

Sorry for the delayed response. We figured it out yesterday.

I looked at that, but everything seemed in order.

The problem was in the SMILEs themselves. It turns out one of the groups had included the ions that you would need to keep this chemical in water in the SMILEs while the others did not have this. Once I removed the chemicals, the search worked perfectly fine.