Comparing molecules by Tversky similarity


I would like to compare molecule files A and B and find all molecules in file B that have a Tversky similarity >0.8 to any molecule in file A, using structural fingerprints for comparison.

Any tips for an efficient workflow are highly appreciated!


Hi Evert,

I think that the following workflow should do what you need, with the following caveats.

a) The row splitter simulates File A and File B (I didn't have two SDF files with similar enough compounds).

b) Similarity calculated here is tanimoto. If you really need Tversky I think you'll need to use the Java Distance node to define the Tversky distance, and pass the output port into the Similarity Search node.



The Indigo 2 fingerprint similarity node will directly calculate a Tversky similarity for you.

I managed to combine the 2 proposed solutions, and made a workflow that generates a Tversky similarity column for each molecule of the reference molecules, by looping over them one at a time. This is still suprisingly fast.

Hopefully this is useful for other user.