fast scalable fingerprints


#1

I need to compute pubchem fingerprints for a minimum of 50 million molecules. Using a normal desktop (4-8 gb RAM) and knime cdk nodes it’s only been able to complete 30% in 24 hrs and looks to be stalling.

How can i make it faster or more scalable? i have already used the parallel chunk nodes. Is there another pubchem fingerprinter i can call?


#2

Hi InsilicoConsulting,

You could try using streaming execution and there reduce amount of rows processed to a certain reasonable number (depending on your machine)
Here is more information on streaming in KNIME Analytics platform
https://www.knime.com/blog/streaming-data-in-knime

Are you using the Fingerprints node from CDK community extensions? It took 9min to generate circular fingerprints (radius 2) for 2 million of compounds on my machine (8Gb allocated for KNIME Analytics platform)

Best,
Daria


#3

Normal CDK pubchem fp works, so does looping and parallel looping although its nowhere as fast as your experience. I am using smiles input.

I tried streaming after encapsulating/wrapping nodes in a metanode. On both linux and windows knime version 36 , it threw the following error

Execution failed: Incorrect implementation; the execute method in FingerprintNodeModelreturned a null data table at port: 0
ERROR Fingerprints 3:1:11:0:2 (IllegalStateException): Invalid result. Execution failed, reason: data at output 0 is null.
ERROR Cardinality 3:1:11:0:4 (DataContainerException): Adding rows to table was interrupted


#4

Another error
:11:0:2 Execution failed: Incorrect implementation; the execute method in FingerprintNodeModelreturned a null data table at port: 0
ERROR Fingerprints 3:1:11:0:2 (IllegalStateException): Invalid result. Execution failed, reason: data at output 0 is null.
ERROR Cardinality 3:1:11:0:4 (RuntimeException): java.lang.InterruptedException
ERROR WrappedNode Output 3:1:11:0:6 (DataContainerException): Adding rows to buffer was interrupted