Fingerprint Similarity Improvement Request

Hi,

I notice that the Fingerprint Similarity node has the option to use the Tversky Index which is a modification of the Tanimoto Index with alpha and beta parameters.

I am completely unsure what alpha and beta values the node uses for the Tversky Index. Ideally I would like to be able to specify the values myself, this can be really powerful. If this is selected from the dropdown, can two boxes in the node appear to specify alpha and beta.

If you have a molecule A you are searching against with a set of molecules B, then;

Specifying an alpha value of 0, allows you to search for molecules which are substructures of A. i.e. the highly scored molecules will have substructures represented in A without having other substructures present which are NOT present in A. This is useful for finding molecular fragments of A.

Specifying a beta value of 0, allows you to search for molecules which are superstructures of A. i.e. the highly scored molecules will have the most substructures to that of A regardless of the fact that the molecules may contain additional functionalities which are NOT present in A. This is useful for finding molecules which contain all of A, or close to all of A, and more. For example A maybe a fragment with some low level activity and you want to find molecules which contain A plus more chemical features.

 

As an aside, what type of fingerprint is the Indigo fingerprint, is it an Extended Connectivity/Functional Class type fingerprint, i.e. Morgan algorithm/Daylight type where it is defining patterns in connectivity or is it assigning functional groups to bits, and identifying TRUE or FALSE for the presence of many functionalities like MACCS fingerprints.

 

Thanks in advance,

Simon.

I has been noticed that a much earlier build if Indigo had these alpha and beta options in the Fingerprint Similarity node to change the values, which were set to 0.5 each by default (which is the Dice Similarity Measure).

Please can these alpha and beta configuration boxes be returned to the Fingerprint Similarity node for the Tversky measure.

Thanks

Simon.

 

Hello Simon,

I have restored options for Tversky similarity measure.

Out fingerprints are constructed by exhaustive subtrees and cycles enumeration. Some explanation I gave here: https://groups.google.com/d/msg/indigo-general/1Z25Fz2WXRo/TdXxQZUNQj8J

Best regards,
Mikhail

Many thanks for quickly fixing the Tversky similarity measure.

And many thanks for the link to the detailed post on your Indigo fingerprints. This is most useful.

Simon.

Hi,

I wonder in my node I cannot select different similarity scores. The output is tanimoto. The only things I can change are

 

Column with fingerprint

Column with reference fingerprint

Aggregation method

Return type.

Is there something wrong with my Knime version?

Just installed new updates...

Hi Mikhail,

I am currently looking at Tversky similarity searching using the Indigo Fingerprint Similarity node but find that the results are highly dependent on the fingerprints used. I am searching for superstructures of A, so set alfa=1 and beta=0, and then compared the results when using Indigo fingerprints, RDKIT, or CDK (all default settings).

The funny thing (or maybe expected?) is that all give a similarity of 1.0 so long as there are no hetero atoms in rings, but when I do the comparison with a reference structure containing a pyridine, only the Indigo fingerprints produce a similarity of 1 (as expected):

Query Reference Indigo RDKit CDK
CC1=CC=CC=C1 C1=CC=CC=C1 1 1 1
CC1=NC=CC=C1 C1=NC=CC=C1 1 0.556 0.979

To me this is a peculiar result, had expected the similarities for the 2nd row also to be 1.0 irregardless of the fingerprints used, since the query here is also an exact superstructure of the reference, as in the top row. Is this due to the RDKit and CDK fingerprints being dependent on the assymetric nature of the substituted pyridine ring?

Thanks/Evert