Multithreading

Dear all

I would like to know if knime makes use of the different cores a PC has when running a workflow. And if there are ways to make the processing time faster.

 

Thanks in advance

KNIME can execute nodes in parallel if they are are not dependent on each other (independent branches in the flow). Also there are some nodes that process all rows independently in parallel. So yes, KNIME does use multiple cores if possible.

I have a further question: I am using a Variables Loop Node with an input of 6 different Variablesets within this node, I have a Cross Validation Node and within this Node, I have an SVM learner. I expect that each Variableset and each Crossvalidation can absolutally run in parallel even if the SVM learner will not support Multithreading. But Knime just uses 3 cores of my 8 available cores.

If I replace the SVM learner with an Fuzzy c Node, only one core is bussy. So it seems that crossvalidation and variables loop are not concurrent nodes? But http://www.ub.uni-konstanz.de/kops/volltexte/2008/6485/pdf/Parallel_and_Distributed_Data_Pipelining_with_KNIME.pdf  says something else.

Is there something I missed?

The paper you are mentioning describes a prototype that we did not yet include into the official release. Therefore all loop iterations are still executed sequentially. Only certain nodes can process their input data in parallel (if there are no interdepencies between rows).

Thanks for your reply. Do you think its possible for me to get and use this prototype? Otherwise i have to create my own parallel Crossvalidation node.

Hi could you please also provide me with the parallel Cross-validation nodes?

thanks in advance,

cheers

chris

Pervasive has created a plugin for KNIME with their DataRush engine (http://www.pervasivedatarush.com/Products/DataRushforKNIME.aspx) that uses dataflow networks to create pipelined parallelism, it does a great job of taking full advantage of the cores available in my experience and comes with profiling tools so you can squeeze every bit out of your multicore machine.   If the nodes you are looking for don't come directly out of the box with it, there is a Java API which allows you to create your own KNIME nodes  using their pipelining engine.  You can request a trial of the product at their site.

 

Quick update: Pervasive has worked with KNIME to improve the KNIME API so that new nodes will be more easily parallel processing enabled, and added to their KNIME analytics accelerating stack. Here’s the new link for more info: http://bigdata.pervasive.com/Partners/KNIME.aspx

Can multi threading be used within same work flow to process 4gb file in different ways at same time… please do provide node reference…