Optimization of time calculation Fingerprint similarity indigo node

Hello everyone,

I have a large data set of molecules (500 000~), I transform the structure into fingerprint (ECFP4) using the RDKit fingerprint node or the CDK fingerprint node. Then I want to calculate the Tversky similarity all against all using the Indigo fingerprint similarity.
The problem I encounter is the time of calculation. I have tried many workflow changes to optimize the time but none of them are efficient.

What did i tried :

  • Put the output of the Fingerprint node on the Two input of the similarity node
  • Enhance the heap size for knime to 6g
  • Make a loop that take 1 line of the set as reference for the similarity node and compare the whole set to it
  • and some other little changes that I can’t enumerate and clearly did not worked.

So maybe someone have an Idea ? Or maybe that is just impossible to perform this in a reasonable time with thise node or with Knime ?

Thank you by advance for every answer.

Baptiste