Chunk Loop + Joiner takes longer time for later chunks of data

Hi,

I am trying to join two huge tables (550M and 810M records) using the Joiner(Labs) node in KNIME v4.3.
Since it’s a left join, I am using a Chunk Loop to split the left table to 4 chunks of the same size and join each with the right table, then concatenate all rows.
I’m using a pretty powerful PC with 64GB Memory and 2.9GHz Intel Core i9 Processor. The joining process took around 2-3 hours for the first chunk but around 5-6 hours for the second chunk. The third chunk has been in process for more than 10 hours now. Even though there are the same number of records in each chunk I don’t understand why later chunks take so much more time to be processed. Is there any option I should activate/deactivate to speed up the process?

Hey @behrooz,

did you try to increase the -Xmx value in your knime.ini file? Maybe you have not enough memory allocated to KNIME. You can check your memory while running KNIME if you go to File -> Preferences -> General -> Show heap status.

Best,
Julian

1 Like

Hi Julian,

Thanks for your reply.
Yes I had allocated 50GB memory to KNIME.
What I did as a workaround was to just split the left table data into 4 parts and run the joiner on each split. Unlike the chunk loop option It took almost the same time for each split. Maybe it has to do with accumulation of large temporary tables when I use loops which cause the system to slow down.

Best,
Behrooz

Hi Behrooz,

yes, that’s what I think as well. As soon the garbage collector is called more often, the overall performance decreases, so that could be the case.

Did you try to play around with the Performance options of the Joiner (Labs) node? You can try to turn off sorting (Arbitrary output order) + increasing the number of opened temp files used for the operation. This might help a bit as well, but good to hear that your workaround helped you out already!

Best
Julian

2 Likes

@behrooz12 that’s interesting - worth investigating. Did you try the new columnar backend?

1 Like

I didn’t know about that. Should give it a try. Thanks for sharing

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.