Table row to variable loop start starts fast speed drops quickly without RAM or CPU constraints

Dear all,

I have a workflow that tests which of 8 million combinations of parameters produces the best results. I do this using Table row to variable loop start that passes each combination into the workflow. However, the speed always decreases significantly, e.g. the first minute it does 1360 rows per minute, a few hours later 200 or so per minute. This problem made me spend a lot of time on optimizing the performance, which is good, but the speed decrease is always there. Important to mention is that the cpu seems to work hard in the beginning (80% on I7 2700k quadcore), but later on drops to 30-40%. There is an increase in ram usage from maybe 4 to 9 gb, but far from the 32 that i have installed.

Outside of the row loop (which is set to keep data in memory) i have a chunk loop (stores on disk, as the table is larger) that splits the data in 10 equal parts so I can save intermediate results. Everything else is also set to keep data in memory. Although far from the max, there seems to be an accumulation of data in ram, although I set the outer chunk loop to store data on disk.

The only way i can perform this task in reasonable time is if I can maintain the initial speed. What can I do to achieve this?

Many thanks in advance,

Eduard

I forgot to mention: only restarting knime kan ‘reset’ the original performance of the loop.

@skallagrimson, It is not clear from your description, how muchme mory you allocate for KNIME. See link below
https://www.knime.com/blog/optimizing-knime-workflows-for-performance
Also, for better understanding hole logic could you share your work flow?

Thanks for your reply! I indeed forgot to specify that; I allocated 25 GB to knime.

I have attached the workflow, with tiny versions of the original tables.

REFIT_share.knwf (125.3 KB) refits2.xlsx (9.7 KB) Refit_share(combinations).table.txt (2.3 KB) Refit_share(testdata).table.txt (9.6 KB)

For the surface look, it looks strange solution to include excel read in a loop with same values as well as rename columns. Plus, number of component could be streamed.

Thanks for your suggestions!

*The excel read does not hurt much as it is only read once (I checked this with the timer node).
*The rename columns are necessary for e.g. the rank node (so that always the column with a fixed name is done); I tried to solve this with flow variables (my preference) but this did not work for rank.
*I tried streaming with different chunk sizes, but it did not make it quicker (checked with timer node). Plus the same slowdown over time also occurs when things are streamed.

My solutions may not be optimal, but they do not provide an explanation for the slow down over time I think (but I may be wrong). If anything my CPU is “leaning back” more and more and the RAM usage also does not seem to increase anymore (now at 9 GB of 24 allocated (32 total))

1 Like

I definitely recommend you to review your algorithm. It seems you do not need to split 1 row.
KNIME will use the same data for both branches.
As leading IBM programmer Judd said, using PL/1 as any other language you can write very inefficient programms.

As I said above “I have attached the workflow, with tiny versions of the original tables”, and adjusted the splitter accordingly (so that there woudl be something to split

With the real dataset it would be 8 million rows coming in of which I filter the first million or so manually, these million will go into the chunk filter which runs batches of 100.000 rows. Because I wanted parallel processing but felt the parallel chunk loop was performing less well than a ‘manual’ parallel process, I split the flow into two (so 50.000 rows each) and concatenate them at the end.

I am especially looking for an explanation for the decreasing performance, rather than the performance in general; I could have something that runs 10 times quicker, but if the fall off is still there, everything will still grind to a halt at some point.

Hi @skallagrimson,

Thanks for bringing this to our attention. We are aware of an issue that can lead to performance degradation over time when (1) many (millions) small tables are being generated and (2) heap space allocation stays well below its configured maximum. It sounds like both criteria are met in your case, so I think the speed drops you are experiencing could be explained by this. The good news is that we have a fix for it in the works.

I’ll try and reproduce the issue you’re facing with the workflow you kindly provided. If I can reproduce it, I’ll also check if our internal fix resolves the issue. I’ll get back to you as soon as I know more.

In the meantime, can you confirm that you are working on KNIME Analytics Platform 4.1.0 or later?

Regards,

Marc

4 Likes

Dear @marc-bux,

Thanks for your reply! As mentioned by izaychik63, my workflow is probably not the best (I am an archaeologist, not in IT!), but I still have the feeling that there is more going on than a suboptimal workflow. I studied it a bit more today and what I noticed is that even though I am well below the memory limit, the more heap space is being used the slower the loop runs. Lets say it is first fluctuating within a total heap space of about 500 mb (and 80% cpu), hours later it is 2 gig and the loop is running noticably slower. (and 30-40% cpu) At some point I lose patience, but I guess the ram will increase and the performance decrease.

Thanks for looking into it and looking at my workflow! I have to warn you that the flow is not completely functional because (I later realised) that the provided testdata do not contain the IDs that a reference row filter/joiner later request (see ‘refit reference data’). If it is unclear please let me know.

I am running KNIME 4.1.2.

1 Like

Based on your comment about “many small tables” I thought I had found a work-around; rather than feeding each of the the 8 million rows individually into row to variable loop, I used a recursive loop followed by a row splitter that sends the top row into the loop and sends the rest to the lower input of the recursive loop end. With a test of 10k rows I see fluctuating performance rather than consistently decreasing, which is good. However, when applying this to a chunk of 100k rows, the row splitter starts to become a bottleneck because it rebuilds a table of 100k-1 rows (but increasingly faster over time I guess). I then tested a recursive setup where a chunk loop feeds 1k sets into the recursive loop. Now, the 1k batches start fast, but take increasingly longer to process. I feel that consistent performance can only be achieved when data is not accumulated in memory, even when it is very little and when it is far below the max ram quantity.

Btw, I tried both to have loops in memory, on disk, or various combinations. I also tried moving window loop. And to not collect data variable loop ends. The problems seems to be always there.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.