Parallel Chunck loop - when is it worth it?

(Aside from a possible "obvious" answer that if one has lot's of time consuming stuff...)

When I run some "simple" test, say one node doing some math manipulation. It goes so fast in the non-parallel mode, it doesn't seem worth it to build in a parallel loop.

Now on the other hand, when I have some larger tables with more complex manipulations or searches or such, it would seem obvious to use parallel. Though I noticed that this implies much higher memory usage (not surprising since it seems to copy your workflow multiple times). But, this implies more shuffling around of memory which in turns requires resoruces as well, so in the end I don't see much gain.

As example, I was testing "Most common substructure" node from RDKIT in a loop. I trimmed it down with pre-prepared table containing only the molecules and nothing else. The parallel version is 10-30% slower.... and uses for my list of ca 40k molecules nearly all 8gb memory versus ca 2gb memory in the linear version.

For reference, this on an i7 win7 machine, 16gb memory (8 dedicated to knime). On a friends i5 machine, the linear one is (obviously) slower, one of the reasons for thinking of parallel. But the i5 is even slower due to the memory management?

Thus back to the original question - when does it make sense, or rather, what is other uses exerience with parallel chuncked workflows?

 

I also tried it, but for me it only made things slower. I run a workflow that does some heavy image modifications on large images. It turned out that 8 GB memory was not sufficient, nor was the speed of my SSD, to read/write everything simultaniously. Now I run this on a laptop, so that is not really supposed to be efficient, a stand alone machine with 30gb of working memory would be better I think, perhaps then for me it could make sense to run in paralel. However it standardly added 5 copies, I think this is based on the architecture of your processor. However it might make more sense to have some sort of estimation on the memory requirement of the workflow, and adjust the amount of copies also to that. I could imagine that running 2 paralels might be better then 6 in my case.

Also a downsize of the parallel chunck loop is that you can only set the amount of parallels, not the chunck size. This could be easily added I assume.

Another advantage might be to have some sort of delay in the start of the copies. If I remember correctly all duplicate workflows started simultaniously, so all reaching the nodes that are heavy on the memory at once, spreading this out, by letting the second copy start only after the first or second node of the first copy have completed, might also boost performances.

So, I do notice an improvement by manually selecting parallel chunk number, I use 4-6 (conservatively) and I can squeeze out additional 20-40% speed.

Still I can get at times different types of error messages which are for me, but usually can be resolved by restarting knime (i.e. memory issue?).

Additionally, there still is the duplicate row ID number problem which I read about somewhere. I have to resolve this by creating new row IDs with uniqueness selected, otherwise the whole thing stops before concatenating everthing for the parallel chunk loop end.