Memory handling

Hi Guys,

I just want to ask any advice regarding memory handling while executing the workflow. Currently I am importing 50,000 xls files but it seems that the KNIME cannot handle such size of data. I already did some adjustments. One of it is to adjust the size of RAM in knime.ini(currently 12gb) and the other one is to enable the write table to disc. I also used some garbage collector nodes. Can anybody enlightenment on what is happening when I enable the write table to disc? does it literally write the data table outside the KNIME Platform then erased it as soon as it finishes? or does it only use the PC’s Memory as addition to what is the current settings on the knime.ini? Also, I want to know which is better to do or use in manipulating the size of data like I mention, python codes using python nodes or the standard nodes?

Thank you in advance guys.

Regards,
Gambit

Hi @Gambit,
How many rows does each XLS file have? Generally, KNIME should not fail even when the data does not fit into memory. For general info about performance and scalability, I can recommend this blog post and the related whitepaper: https://www.knime.com/blog/tuning-the-performance-and-scalability-of-knime-workflows. When you enable the write to disk option, the output data of a node is not stored in memory but in a file on your hard disk instead. If this is not enabled, KNIME tries to keep the data in memory if it fits and only writes it (partly) to disk if it does not. Using Python might not be very helpful if the data does not fit into memory, because it does not have this disk backing that KNIME itself employs. Do you read in the XLS files in a loop or from a folder with a single Excel Reader node?
Kind regards,
Alexander

3 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.