I have been struggling to find a way in KNIME to compare molecules in 2 column pairwise (Note: not a pairwise matrix!!!).
Consider an input file (smiles of molecules in both columns):
Mol1a,Mol1b
Mol2a,Mol2b
Mol3a,Mol3b
…
Mol1000a,Mol1000b
I’d like to get 1000 Tanimoto vaules for comparing 1000 pairs only (e.g. using ECFP4) and not 1M Tanimoto value pairwise matrix. Is there an efficient way to do so in KNIME? Any piece of advice would be highly appreciated, thanks!
Thank you for the proposal. I am actually looking for a quicker solution as looping through the each of millions of pairs will be quite time demanding (I tested a similar solution myself). And the usual scale of my daily operations is >10^6 molecule pairs… Any other ideas? Thanks!
I don’t see an obvious solution with out of the box KNIME nodes and unfortunately the BitVector column isn’t yet compatible with the native type in the Java Snippet.
My next suggestion would be to use a Python snippet and do the calculation of similarity using RDKit in python and append a column with the similarity value.
Unless @s.roughley has done something in the Vernalis contribution that will make this easier?
Instead of Python I would do it in the Java snippet as you can use rdkit there as well and you don’t have to pay the large serialization penalty between knime(java) and python. Given the large amount of molecules that will in my opinion be a lot faster.
WE have nodes which will do things like list the set bits from a fingerprint. I guess then you could use a java snippet to calculate your own Tanimoto, e.g.:
Hi, Swebb,
I am using KNIME 4.5.2, but in this version, Finger print similarity, Fingerprint to set Bits List nodes are not available. How can I install these nodes and deploy to the KNIME version right now I am using. Please help me in this regard.