The Parallel Chunk Node introduces some overhead in splitting the input table into the specified number of chunks. As is, this splitting step is carried out even if the chunk count is set to 1.
My request is to skip the initial splitting when the chunk count is one. Then I could avoid said overhead for small data tables by setting the chunk count through a flow variable depending on the row count of the input table.
Dear KNIME team,
I love the KNIME labs nodes for parallel execution. Searching the forum for an idea that I had, I agree with karlson's post here. Let me try to put the reasoning behind it in my words:
It would be great, if at parallelization degree 1 (i.e. only one chunk), the node would behave just like a regular Chunk Loop Start node, not creating an (empty) metanode. I admit that parallel execution of degree 1 sounds strange at first. But there is a purpose behind it:
I have just tweaked a bigger workflow to be executable on both a PC (for debugging and development) and a capable virtual machine in the cloud. Depending on the number of available cores and RAM, I determine the desired degree of parallelization automatically (in a different way than KNIME's 'automatic chunk count' feature). My parallel chunk start node's chunkCount variable is hence linked to a flow variable.
In my local PC environment, this variable is set to 1, meaning no parallelization. However, when executing the node, the workflow gets visually messed up by all the lines connecting the KNIME-created, but empty metanode handling the parallelization. I am absolutely aware that this is functionally correct and not a bug. It would be great though if in case of chunk count 1, the automatic metanode would not be created at all, as it is unneeded and disturbing. I guess this is also what karson suggests when he writes about "skipping the initial splitting".
I attached a screenshot and a minimal example workflow. I'm using Knime 3.2.0.
Thanks for reading -- and maybe also considering this change request! :-)