RDKit Diversity Picker warning: the fingerprints in table 1 and 2 are not compatible, which may lead to wrong results

Hi RDKit lovers,

I'm trying to use RDKit Diversity Picker node in order to select a diverse set of molecules (set_1) that bias away from a second set of molecules (set_2). This second set of molecules is read from an SQL database where the fingerprints column is stored with a text format. After this second set is read the fingerprint column is re-casted to BitVector type using KNIME Create Bit Vector node.

When I use the RDKit Diversity Picker node with the set_2 molecules on the second input port the following warning is given by the node:

 

WARN: The fingerprints in table 1 and 2 are not compatible, which may lead to wrong results.

 

As far as I know the 2 fingerprints are of the same type and length. Can anybody tell why this warning message is triggered? Should I worry about it?

Thanks for any help,

Gio

Hi Gio,

when calculating fingerprints with the RDKit Fingerprint Node there is information stored about the fingerprint type and parameters that were used for the calculation together with the fingerprints. It is stored as properties of the fingerprint column in KNIME and is used to perform some validity (and risk) checks when using fingerprint columns in RDKit nodes for certain logic like you do. In your case you provide the second fingerprint column yourself without having these properties set. As one fingerprint column was apparently calculated by RDKit and the other coming from the database, the RDKit Diversity Picker node sees a high risk that fingerprint type and parameters to calculate these fingerprints may have been different, which would indeed lead to wrong results. If you are certain that the meaning of the fingerprints are exactly the same you may just ignore this warning.

Some logic about the warnings:

  • If both fingerprint columns are calculated by the RDKit Fingerprints Node, no warning will occur if the fingerprint type and parameters found in the column properties are the same. 
  • If both fingerprint columns are lacking information about the fingerprint calculation, the following warning will occur: "The fingerprints in table 1 and 2 might not be compatible, which may lead to wrong results."
  • If only one of the fingerpints columns is lacking information about the fingerprint calculation, the following warning will occur: "The fingerprints in table 1 and 2 are not compatible, which may lead to wrong results."

Again, if you know exactly what you are doing you may just ignore this warning. 

Kind regards,
Manuel

Hi Manuel,

Thanks for your quick and clear answer. Now I see the fingerprints type and parameters stored as properties. The logic about the warnings make completely sense to me and I think it is good that RDKit triggers these warning.

Having said that, it is good to know that if we're sure about the origin, type and parameters used to calculate the fingerprints we can ignore these warnings.

Thanks again for your answer, I hope this forum thread can be useful also to other users.

Best regards,

Gio