java heap error when using write to excel template node

Dear all,

Multiple data sources are processed and saved into multiple sheets in a single Exceltable
If executing the Write to Excel Template node multiple times, Java Heap Error appears

Circumstances:
Win10 Enterprise 64bit, 48 GB RAM, Office365 Enterprise, Knime. Latest updates installed.

How to reproduce:
copy all attached file to c:\temp
1.xlsx (19.8 KB)
java_heap_error.knwf (42.8 KB)
outbound_template.xlsx (19.1 KB)
outbound_template.xlsx (19.1 KB)

Open 1.xlsx in Excel 1st sheet
Copy row 2 100.000 times; so 100.000 rows are filled
Save 1.xlsx
(Step neccessary to go around upload limit)

load Workflow java_heap_error
Remark: For demo I have only one source reading this multiple times

execute node Write to Excel template Write Table Outbound sheet 1-> Operation is executed ok; Java Heap increases
execute node Write to Excel template Write Table Outbound sheet 2-> Operation is executed ok; time increases, Java Heap increases
execute node Write to Excel template Write Table Outbound sheet 3-> Operation is executed ok; time increases, Java Heap increases
execute node Write to Excel template Write Table Outbound sheet 4-> Operation is executed ok; time increases, Java Heap increases
execute node Write to Excel template Write Table Outbound sheet 5-> Operation fails

Questions:
Is it a bug?
Is there a way to relase the java heap memory?
Is there a better way to write to excel (Excel sheet will grow to 210 MB on disk at the end of the application)

Thx for help!

BR, Heinz

1 Like

@Heinz thanks for the workflow and example files. Let me see if I can find someone to reproduce and we will circle back. Thanks for your patience!

There is the Garbage Collection. More performance hints:

Not sure if it is better but you can use OpenPyxl to append data to an existing excel sheet.

And I assume it is necessary to write to a template to preserve the formats? Or would it be possible to write to a blank sheet and add the formats later?

3 Likes

I tried out the workflow albeit I accidentally set it up with 200000 rows and got the same results as @Heinz described. The problem is that it is running out of heap during the execution of the Write to Excel Template node so there is no scope for assisting with garbage collection before or after. The first call I made completed ok but it ran out on the second one. For the one that completed, it had released most if not all of the heap after execution and an added call to garbage collection made no difference.

Looping in @AnotherFraudUser as maybe there is something that can be done inside the node, but it may be something buried in the Apache POI libraries, that I think that node is using, perhaps having problems with large datasets. I’ve not generally tried writing that many rows to a formatted xlsx.

2 Likes

Dear all,

thanks for good support.
I have understand that I run into a bug inside Knime node.

Questions:
How do you go on?
Will there be a fix?
When?

Thx,

Heinz

Hi @Heinz,

in general the library behind my Write to Excel template is quite memory intensive.
Even if you write into a new sheet the process loads the whole file in an umcompressed format including all formats etc.
So I think here is not much I can do there - from what I can see garbabe collection cleans up the files as intended :confused:

Regarding the final file size I i will try to investigate - pretty sure this looks like that the shared strings are not properly set (normally if you have 5 times the string “abbaaa” excel would put that into the shared strings and save 5 times the shared string id instead :thinking:)
Will check if I have to trigger that somewhat manually to properly compress the file.

To have a longterm solution I will check if I can setup a node which interacts directly with the unterlaying xmls :thinking: most likely with a bit of reduced features…
but as I am not sure if I will actually manage to set this up will not be able to provide a timeline for that :smile:

Questions:
How do you go on? - see above
Will there be a fix? - no at least do not see how i could teak the existing node to handle that. But if anyone knows more happy to add fixes to the node :+1:
When? - cannot say

in the meantime i can only second what e. g @mlauber71 said. Never tested the python excel library myself but it always looked like a good way to do it as well :slight_smile:

3 Likes

Thank you very much for your answer.

Is there - maybe - a better way; e.g. saving the 5 Excelsheets into Knime DB tablses and then write them in one step in an exceltable but to different sheets?

How can I format existing excel sheets?

BR,

Heinz

Hi @AnotherFraudUser,

We are experiencing similar issues as Heinz. For most of the scenarios the Write to Excel Template will work for us, but in specific cases where we have 20 sheets to write data into it could happen the Java Heap issues pop up.

In our case the issue is so big it crashed our KNIME Server Executer frequently but didn’t leave enough information in the log file to determine actually which workflow caused the issue. Only after some time we found out it was the Write to Excel Template node.

We are still on 4.7, soon to be updating to 5.1. I noticed the 4.7 is now deprecated in 5.X, but could you let me know if there are any performance improvements applied in the new version? If not, we will have to build our own node with Python to workaround this issue in specific cases. As we did in the past, I’m willing to help you in any testing :grin:

Thanks!

1 Like

I’ve build a Python script which can simply copy data into an existing sheet from a specific cell and on. This is my (temporary) work around for the workflows which could lead to java heap issues.

It would be great if the performance fixes could be applied on the Write to Excel Template node, as this node is much more powerful than the Python script I built.

My KNIME 5.1 installation with (for the sake of testing) 2GB of memory assigned crashed while executing all 10 “Write to Excel Template” nodes, but will be successful when executing the 10 “Write To Excel Template With Python” components.

Write to Excel Template - Java Heap issue and fix in Python (KNIME Forum version).knar (305.0 KB)

1 Like