Any ideas when Parrallel Chunk loop end performance will be resolved

Hi @knime

Is there a view as to when the parallel chunk loop end performance issue will be resolved. As soon as i hit large (>1M rows) the chunk end takes absolutely ages to finish and i not disk usage is sooo slow from knime.

much love

Gavin

Hi @Gavin_Attard,

thanks for raising this again. I recalled and found our prev. conversations and it seems that, given how much time has passed, it is unlikely to happen. I suppose this is too specific, though.

However, I happen to have done some more in depth investigations in the past and found that Knime performs painfully slow when it comes to certain processes related to reading and writing data. The Parallel Nodes, as well as saving a workflow, falls right in that category.

CPU and Ram are almost never going to become a bottle neck but the disk certainly is. Even if I use a high performance SSD, directly hooked up to the CPU PCIe Lanes as a separate disk, Knime can become painfully slow.

I.e. I noticed, when extracting the system properties, that the Java Temp dir is on the primary disk even if the workspace is not:

Anyhow, when saving or data is being collected like with the Parallel Chunk End node, SSD throughput is chilling at 6 % or less. To my dismay, no one felt this worthwhile having a look.

Best
Mike

1 Like

Agreed i also think the primary issue here is disk i/o. Its not hardware related, defo software.

Fully agree. We could certainly rule out hardware limitations and I believe we concluded, backed up by the experiences of others too, that this regression started some version of Knime ago.

Regardless of the hardware, I also managed to rule out the backend type (row vs. columnar), compression of data during save (absolutely no CPU impact) and other things too.

The only continuity I see is large amount of data which I believe Knime is trying to sort. While reading the highly recommendable article from @mlauber71

Some args caught my attention like -XX:+UseStringDeduplication or -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/path/to/gc.log. I wonder if there are processes like the string deduplication or garbage collection at fault causing the performance regression?

On second thought that would make sense as both I assume are not parallelizable using multiple CPU and the experienced issue scaled, gut feeling wise, linear with the amount of data. On top of that, it can be kind of replicated by triggering a garbage collection.

Eventually the flags in the article could help to increase log verbosity but without some support by the @knime team we would just do guesswork.

1 Like

Holy crap.
Have been months that i’m yelling to optimize performance over new features.
Loops in general, QOL improvements (like search for field in every nodes).
Knime is a wonderful swiss knife, but is struggling to be a very good ETL platform for such reasons.

Agreed, i had a client in the market, but i could not recommend it (they needed server), due to the write to disk times that are slowing everything down.

@knime this really needs addressing