Incorrect automatic chunk count in Parallel Chunk Start node

gcincilla · April 5, 2016, 6:12pm

Hi guys,

I use the Parallel Chunk Start and Parallel Chunk End node to parallelize some parts of my workflows. In general these nodes work well and it's always a pleasure to see how all the working threads of your machine are burning!

Nevertheless if in the Parallel Chunk Start I use the “Automatic chunk count” to parallelize, I always obtain 3 more chunks than those available in my machine. I also checked that the “number of working threads for all nodes” are correctly set in: Preferences → KNIME.

Please, do you know to what it is due this behavior?

wiswedel · April 6, 2016, 10:54am

"Automatic" is slightly overestimating. It's

(int)Math.ceil(1.5 * Runtime.getRuntime().availableProcessors())

So if the system has two processors it will be 3 and if it has four processor the thread count will be 6.

We haven't done a thorough analysis of this heuristic, though. I think the motivation is that you want to have slightly more chunks than CPUs because then more (smaller) jobs are being executed -- and hence you have more parallelism even if the first job(s) already complete -- but you do not want to have too many jobs as this will be a lot of job swapping.

gcincilla · April 7, 2016, 11:18am

OK Wiswedel, thank you for your reply! Now it makes sense and it's clear.

Cheers