Tanimoto similarity for pairs of compounds

#1

Hi,

I have been struggling to find a way in KNIME to compare molecules in 2 column pairwise (Note: not a pairwise matrix!!!).

Consider an input file (smiles of molecules in both columns):
Mol1a,Mol1b
Mol2a,Mol2b
Mol3a,Mol3b

Mol1000a,Mol1000b

I’d like to get 1000 Tanimoto vaules for comparing 1000 pairs only (e.g. using ECFP4) and not 1M Tanimoto value pairwise matrix. Is there an efficient way to do so in KNIME? Any piece of advice would be highly appreciated, thanks!

0 Likes

#2

It’s not going to be a particularly quick solution but you could do this:

image

image

image

Cheers

Sam

0 Likes

#3

Hi Sam,

Thank you for the proposal. I am actually looking for a quicker solution as looping through the each of millions of pairs will be quite time demanding (I tested a similar solution myself). And the usual scale of my daily operations is >10^6 molecule pairs… Any other ideas? Thanks!

Kind regards,
/Alex

0 Likes

#4

I don’t see an obvious solution with out of the box KNIME nodes and unfortunately the BitVector column isn’t yet compatible with the native type in the Java Snippet.

My next suggestion would be to use a Python snippet and do the calculation of similarity using RDKit in python and append a column with the similarity value.

Unless @s.roughley has done something in the Vernalis contribution that will make this easier?

Cheers

Sam

1 Like

#5

Thank you, Sam! I will consider this idea. Sad that this easy comparison function is not available in KNIME yet. :frowning:

0 Likes

#6

Instead of Python I would do it in the Java snippet as you can use rdkit there as well and you don’t have to pay the large serialization penalty between knime(java) and python. Given the large amount of molecules that will in my opinion be a lot faster.

1 Like

#7

WE have nodes which will do things like list the set bits from a fingerprint. I guess then you could use a java snippet to calculate your own Tanimoto, e.g.:

image

image

image

Snippet:

// Your custom imports:
import java.util.Set;
import java.util.LinkedHashSet;
import java.util.Arrays;

// system variables
public class JSnippet extends AbstractJSnippet {
  // Fields for input columns
  /** Input column: "fp1 (Set bits)" */
  public Long[] c_fp1Setbits;
  /** Input column: "fp2 (Set bits)" */
  public Long[] c_fp2Setbits;

  // Fields for output columns
  /** Output column: "Tanimoto" */
  public Double out_Tanimoto;

// Your custom variables:

// expression start
    public void snippet() throws TypeException, ColumnException, Abort {
// Enter your code here:
Set<Long> fp1 = new LinkedHashSet<>(Arrays.asList(c_fp1Setbits));
Set<Long> fp2 = new LinkedHashSet<>(Arrays.asList(c_fp2Setbits));

int a=fp1.size();
int b=fp2.size();
fp1.retainAll(fp2);
int c=fp1.size();


		
out_Tanimoto = 1.0*c / (a + b -c);

image

Hope that helps?

Steve

2 Likes

#8

Thanks! Could you provide an example code for Java snippet, please?

0 Likes

#9

Have not seen Steve’s reply when I sent my previous message. I will try the code and let you know. Thank you to all for fast response! :wink:

1 Like

#10

Hi Steve,

I have tried your suggestion, however, Java snippet complains:

Could you, please check what is wrong? May be you can share a workflow, so i can load and test it? Thanks!

Kind regards,
/Alex.
P.S. I am using Knime 3.5.2. version (just in case it makes a difference).

0 Likes

#11

I had the exact same problem - you need to change the Java Type of the input columns to ‘Array of Long’
Example attached

Tanimoto_Example.knwf (8.8 KB)

In case you cant open the example:

Steve

1 Like

#12

Nice! That is the version I started on :smiley:

Why not update?

Br,
Ivan

0 Likes

#13

Oh dear, now I feel KNIME-old. I think I started on around KNIME 2.2!

https://www.knime.com/changelog-v220

Steve

1 Like

#14

Dear all,

Thank you for your help. Everything works fine now!

Best,
/Alex.

3 Likes

#15

Wow! Now I feel like a fresh KNIME user!

Some nice features there :smiley:

2 Likes

#16

Yes, and the community has changed a lot too. No Vernalis, no RDKit back then, and possibly no Indigo, amongst many other things…

Steve

0 Likes