Does Parallel Chunk Start node uses the threads available?

ktzanakis · March 2, 2020, 12:31pm

Hi everyone!
I am using the Parallel Chunk Start node and then some DB nodes to insert data in a database(locally) in parallel. I have 8 threads in my laptop and I am wondering if there is an optimal split for my data so that “insert” is faster. Does the number of the chunks basically is the number of potential threads that can be used if they are free?

AlexanderFillbrunn · March 4, 2020, 9:33am

Hi @ktzanakis,
this depends on your setting in the KNIME preferences. The setting “Maximum working threads for all nodes” specifies how many threads KNIME uses. By default, this is the number of logical cores in your machine times 2, so if you have 4 cores with hyperthreading, you have 8 logical cores and KNIME will use 16 threads. Those threads are managed in a thread pool, so any time a node is executed, a thread from that pool will be tasked with doing the work. The Parallel Chunk Start does nothing more than generate multiple workflow branches with the content of the loop and executes those in parallel using the thread pool. My guess would be that splitting the table up into more partitions than you have numbers of (logical) cores will not improve your runtime, since the cores will just be switching back-and-forth between the threads.
Kind regards
Alexander

ktzanakis · March 4, 2020, 10:39am

Hi!
The Maximum working threads I have is 8, so I guess splitting the data in to more than 8 partitions/chunks, does not make any sense! Thank you very much Alexander!!
Best wishes,
Kostas

system · September 2, 2020, 10:39pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.