I’ve built a workflow that outputs 8 different tables. I’d like them to all be joined up as different worksheets in the same Excel file. Right now, I’ve just created a chain of Excel Sheet Appender nodes. As it progresses, each successive node takes longer to finish, even if the data tables for that specific node are not very big (sometimes taking 10 minutes per node). It seems that reading the Excel file is causing major memory issues. I have even started to get a “java heap space” error now as well.
The previous solutions I’ve seen on the forum only talk about removing the “autosize columns” feature and increasing the usage of ram and cells in memory. Are there any other solutions to creating an Excel file with multiple sheets that are not so resource intensive?
is any of your sheets really big?
Not quite sure but i remember that the sheet appender loads the whole file before appending the sheet (maybe i remember incorrectly…)
But maybe you could try to append sheets from large -> small and check if that makes it better?
Do you have an example workflow to try solution with?
Yes, it does seem to go a little faster if I append the largest sheets last. I just put together some test data and smallest first would write in 17 seconds, whereas smallest last would take more than a minute.
Any other ideas for speeding up write time? What about something like writing to a CSV and then joining them in an Excel file? I guess that means it would have to be done outside of KNIME right?
you’re completetly right. Appending sheets to an excel file inside a loop takes more and more time. The reason for this is that when you append to a file we need to open and read that file before we can add data to it. This explains the time + memory footprint increase.
One way to decrease the time and memory footprint is to process the files in non-increasing table size order, as pointed out by you.
We will ship with 4.3 a new version of the Excel Writer that allows you to write several data tables / sheets without the need to append them. The node will have dynamic ports for this. So if it is an option for you to write everything in one go rather than to append all the files you’ll see a major speedup + way less memory pressure.
well there are solution with Java POI (the library used by the KNIME nodes as far as I know)
But I guess the best solution would be to wait for @Mark_Ortmann new nodes …hope they come soooon
*another solution would be to just write separate excel files - and to the appending in the target file in the zip/xml structure of the xlsx files.
->unzip all excels
->copy the \xl\worksheets into the target excel (the sheet1.xml contains the sheet info)
->rename uniquely
->change xl\workbook.xml and xl/_rels/workbook.xml to give the correct references and ids to the newly added sheets in your target xslx
->zip again and rename to .xlsx
Then you have added all sheets to the target excel (more or less)
But thats a lot of work - most likely faster to do a small Java/Python program with a Excel lib