Hi all,
I really need some help please - i’m pulling my hair out.
I have a complex workflow that I have built in v4.4.2. To summarise, the workflow loops around up to 45 tasks (a mix of downloading from SAP R3, SAP S4, SharePoint) - each task is run individually (never two at the same time) and at the end of each loop iteration, I am appending a sheet to the same xlsx workbook.
The workflow is really quick and efficient for each of the different downloads but as the loop continues through each task more and more Heap Space is being used up. The maximum number of rows in any of the sheets is 45,000 (with 8 columns) which uses up to 663m (not sizeable) but, in most instances, the number of lines is maximum 200 rows.
My issue is that for subsequent tasks the heap space being used increases exponentially (as it opens, saves and closes the file each time).
I’ve done plenty of research in the forum and so far I’ve introduced:
Garbage Collectors & Cache node before Excel Writers,
changed the memory policy of most nodes to ‘Save to Disk’
Changed the .imi file to 4G (as per screenshot below).
However, I still face issues with heap space error. I have a few questions:
Despite having increased the RAM usage in the IMI file to 4GB, the maximum I ever see available in KNIME is 2048m (as below). Please can someone advise why this is? Is there another setting that I need to change to make full use of the 4G I’ve made available?
What aspects of a workflow take up the most RAM? The reason I ask is that when running my workflow for 24 tasks I got the heap space error on the final upload. I reasoned that if i ran it for half that amount of tasks it would work - WRONG! this time I got the heap space error on the 13th upload.
I’ve seen some people in the forum suggest uploading more than one file with each excel writer which I like the idea of. However, all of my uploads have variables being built to define the sheet name and most of these are within loops. Does anyone have any advice on whether it’s possible to store the data for each task and pass it out of the loop to attach with one excel writer?
In short, I am so close to finishing my project (this was the final big test) and i am desperate to get it finished. haha
if nothing else does work you could save your intermediate results as KNIME table and then do a seperate job with exports to Excel (might be a meassure of last resort)
Thanks very much for your reply - I really appreciate you taking the time to pull it together.
Turns out i was changing the wrong INI file (doh!). Anyways, I’ve upped RAM to 4GB and it worked But just to give more breathing space in case of more data I will be upping it to 8GB.
The workflow is creating a new workbook each time it is run so it is always empty and the garbage collection is run every iteration of the loop (right before the excel writer).
Despite it working now, I think I will look into building a table to store the paths and variables so that I can write the information later (should hopefully save some time on opening/closing the file multiple times.