distributed computation?

Susanne.Dupre · November 30, 2012, 12:25pm

Hello *,

I am new to KNIME and computational chemistry, so it might be that I miss some obvious alternatives....
I am currently running a workflow that includes generation of CDK fingerprints for a large number of molecules (> 1 mio).
As it will be neccessaey to repeat the workflow for different queries, I am thinking about improvements.
I am working on a linux machine with 4 CPUs; the fingerprint node uses only just one node. How can I make use of more than one CPU?

(Another issue: I am thinking about using previuosly calculated fingerprints. Those I would need to store. That I could not yet achieve, as they are bit vectors, that I can type cast to string. But I did not find a feasible way to read and use them afterwards.... )

Best regards and thanx for the support,
Susanne

Frederic_Dedieu · November 30, 2012, 1:05pm

Hello Susanne,

Not being an expert, the second part of your post is the one I will address: write to a table using the Table Writer node and retrieve your FPs with Table Reader. This is what I do and it works well.

HTH,

Fred

s.roughley · November 30, 2012, 6:11pm

The Parallel chunk start and end nodes might work for your fingerprint calculation parallelisation.

Steve

Stephan · November 30, 2012, 6:32pm

Hello Susanne,

to address your second issue first. I agree with Fred's solution. Alternatively you can also write your table output to a database using the database nodes provided.

Regarding your first issue: The fingerprint uses only one core in the current version because it is not multithreaded (in Java, multiple threads run on multiple cores). I will upload a new multithreaded version of the KNIME-CDK plug-in to the nightly repository later today. The nodes are a lot faster than the current ones and I would strongly recommend you to switch to the nightly build starting tomorrow (the repo takes a night to update).

I just ran a test with the new version and when doing calculations all my CPUs light up. :)

Let me know how you get on.

Best regards,

Stephan

InsilicoConsulting · December 3, 2012, 10:48am

Great news Stephan. Looking forward to the multithreaded cdk nodes! I too have a similar 4 cpu linux machine and use cdk pubchem fingerprints, wich has hitherto been a bottleneck in processing ~ million or more molecules.

cheers

Susanne.Dupre · December 3, 2012, 12:04pm

Thanks to you all.
Using table writer/ reader fixes the issue of saving fingerprint calculations just fine in testing. Will be using it on bigger numbers soon :-)

Thanx Stephan for the multithread cdk nodes.
(Just one dumb question on how to update.... Do I need to de-install the old nodes first? If I do so, what will happen to saved workflows, that use those nodes?
[I just tried to update via Help -> Check For Updates but noting was found..... Does using "nightly update " requiry some spcial configuration?]

Thanxs again,
Susanne

P.S.using current version: 1.3.0.201209261219

s.roughley · December 3, 2012, 9:11pm

If you have not already done so, you need to add the nightly builds update site - see http://tech.knime.org/community for the link. To add the site, goto Help-->Install new software... and click on the 'Add' button - then give it name and paste in the URL from the link

Steve

system · April 21, 2023, 9:43pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.