I updated my KNIME to 4.3.1 recently. I started replacing legacy nodes with the new ones, and started from Excel Writer nodes. I really liked the new nodes.
I am collecting information from multiple places and combining all information in one Excel file using append under file and overwrite under sheet name options. The final file has 4 sheets;
sheet 1 has up to 10 rows
sheet 2 has up to 15 rows
sheet 3 has up to 50 rows
sheet 4 has 200k rows
Previously when I run my flow, it was producing this Excel file with the size of 38MB. Now it increased to 49MB, nearly 30% increase in the file size. However, if I open the Excel file and save it again (ctrl + s) without doing anything, the file size decreases to 38MB again.
Also; I started getting “Execution failed in Try-Catch block: Java heap space” error messages now. The -xmx value in the knime.ini was already set to 8g. I solved this issue by increasing the value to 12g but it doesn’t make sense to be honest when I am working with 200k rows only.
This was working fine with previous nodes. Is anybody else having the same issue? Maybe someone can give me some advice.
thanks for your feedback and we’ll look into it.
The execution failed java heap space exception you’re experiencing occur in those workflows where you exchanged the Excel Writer node? If that’s the case, can you confirm that this only happens when you append to an existent excel file? Are you writing xls or xlsx files?
Regarding the size difference would it be possible to share the files with me? If so I’d send you a private message. Would make it easier for me to understand the underlying problem .
I’ll update my other flows later this week which don’t have PII data and keep an eye on the file sizes. If I monitor the same issue, I can share those with you for further analysis.
It’s just a wild guess, but previous writer nodes have “Memory Policy” tab which I set them for “Write tables to disc”. New writer nodes don’t have that option. Maybe that’s why it is using more memory?
This morning I started getting “Execution failed in Try-Catch block: Java heap space” again on the last Excel writer node (which is the last step on the flow). I already assigned 12GB of RAM and doing nothing at the background while the flow is working. I can’t increase the memory more.
I tried running the node many times with no luck. The Excel node tries to append a table of ~200k rows and ~50 columns to an existing Excel file. As I monitored, the issue happens after reaching 75%.
However, I was able to produce the Excel file when I set it as a new file. But the file size went off the roof; reaching 82.5MB. And the file size decreased to 37.5MB after opening the file, saving without making any changes and closing it again. Those are the nodes I added to the flow recently;
5 rule engine nodes
1 joiner
1 concatenate
1 group by
1 row filter
and combined all these nodes in to metanode.
All their memory polices are set to “Write to disc”.
It seems I set the Joiner with the wrong setting resulting more columns. I’ve fixed that and the file size reduced to 54.7MB. It is still higher than the normal though as when I open and save the file, the size reduced to 37MB again.
Here are the column types and counts. It seems the table has 65 columns, not 50 something.
ColumnType
count
String
51
Date and Time
7
Number (integer)
4
Number (long)
2
Number (double)
1
The new nodes changed previous Double types with Number (long).
regarding the file size I’m still doing some digging, but it seems that Excel writes the files slightly different to the library we are using and other tools such as libre calc. Gonna figure out if there is a way to come up with an identical setup.
Regarding your memory issue. I have found a way to decrease the memory pressure, however if you’d append let’s say to that file you just wrote in your final step you’d again see a memory problem. I’m currently investigating if we have control over that problem as well, but I have my doubts. Gonna keep you posted and thanks for your awesome support!