Java Heap Space error on Excel Writer - please help?

Hi all,
I really need some help please - i’m pulling my hair out. :frowning:

I have a complex workflow that I have built in v4.4.2. To summarise, the workflow loops around up to 45 tasks (a mix of downloading from SAP R3, SAP S4, SharePoint) - each task is run individually (never two at the same time) and at the end of each loop iteration, I am appending a sheet to the same xlsx workbook.

The workflow is really quick and efficient for each of the different downloads but as the loop continues through each task more and more Heap Space is being used up. The maximum number of rows in any of the sheets is 45,000 (with 8 columns) which uses up to 663m (not sizeable) but, in most instances, the number of lines is maximum 200 rows.

My issue is that for subsequent tasks the heap space being used increases exponentially (as it opens, saves and closes the file each time).

I’ve done plenty of research in the forum and so far I’ve introduced:

  • Garbage Collectors & Cache node before Excel Writers,
  • changed the memory policy of most nodes to ‘Save to Disk’
  • Changed the .imi file to 4G (as per screenshot below).

However, I still face issues with heap space error. :frowning: I have a few questions:

  1. Despite having increased the RAM usage in the IMI file to 4GB, the maximum I ever see available in KNIME is 2048m (as below). Please can someone advise why this is? Is there another setting that I need to change to make full use of the 4G I’ve made available?

image

  1. What aspects of a workflow take up the most RAM? The reason I ask is that when running my workflow for 24 tasks I got the heap space error on the final upload. I reasoned that if i ran it for half that amount of tasks it would work - WRONG! this time I got the heap space error on the 13th upload.

  2. I’ve seen some people in the forum suggest uploading more than one file with each excel writer which I like the idea of. However, all of my uploads have variables being built to define the sheet name and most of these are within loops. Does anyone have any advice on whether it’s possible to store the data for each task and pass it out of the loop to attach with one excel writer?

In short, I am so close to finishing my project (this was the final big test) and i am desperate to get it finished. haha

Please help if you can, i’d really appreciate it.

Thanks.
Pete

UPDATE.

I’ve checked in the KNIME analytics config and, as per the screenshot below, the system properties are still being picked up as 2048m…

Does anyone know how to update this please? From the config i can’t see exactly which .imi file is being pulled?

Thanks.

Pete

@taylorpeter55 just to be sure: you restarted the KNIME platform after you made the changes to the knime.ini?

https://docs.knime.com/latest/analytics_platform_installation_guide/index.html#_configuration_settings_and_knime_ini_file

Then:

  • 4 GB is not that much (sorry). I would suggest you give KNIME 2/3 of you memory and stopp using all other programs for the time being
  • I have a collection of hints about KNIME and performance, maybe you want to explore: A meta collection about KNIME and performance and performance tuning and some problems – KNIME Hub
  • of course you can reconfigure your workflow, save the paths and variables in a table and do the export later
  • maybe also make sure the Excel files are initially empty or not there (if your use case allows that)
  • then: maybe try a later version of KNIME (4.6.1) and see if that does work and also the latest Excel Writer
  • then: maybe you can tell us a little more about what you are doing
  • do you use the garbage collection after each iteration?
  • then: you could think about not storing some intermediate data in your workflow just the result (Save Workflow without data - #2 by qqilihq)
  • if nothing else does work you could save your intermediate results as KNIME table and then do a seperate job with exports to Excel (might be a meassure of last resort)
6 Likes

Hi @mlauber71,

Thanks very much for your reply - I really appreciate you taking the time to pull it together.

Turns out i was changing the wrong INI file (doh!). Anyways, I’ve upped RAM to 4GB and it worked :slight_smile: But just to give more breathing space in case of more data I will be upping it to 8GB.

The workflow is creating a new workbook each time it is run so it is always empty and the garbage collection is run every iteration of the loop (right before the excel writer).

Despite it working now, I think I will look into building a table to store the paths and variables so that I can write the information later (should hopefully save some time on opening/closing the file multiple times.

Thanks again for your advice.
Pete

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.