Any ideas when Parrallel Chunk loop end performance will be resolved

Thinking about the issue I start to get a good picture of what might happens. The issue is described as follows:

  • Slow save or data collection operation i.e. for Parallel Chunk End
  • No significant CPU, nor RAM nor Disk utilization
  • Moving the Workspace to a separate disk shows no improvement
  • Disc “C” utilization during save / data collection operation lingering
  • Issue emerged (perceived wise) with late 4.x but latest version 5.x of Knime

I currently have a crawl running and noticed, while also reporting some issues about the temp folder, that Knime creates “gazillions” of tiny files.

And just a “few minutes later” >60k files more:

Know, ask yourself, if you have that many files and the Knime data collection operation kicks in, any SSD, regardless how performant, won’t be able to provide its full read throughput. I believe, what I highlighted here, is so far the best contestant for a plausible explanation.

To test this hypothesis, we’d need a workflow (which I started to create some time ago), to reliably create a scenario where Knime creates a lot of temp files. What do you think?

Related post

Best
Mike