Memory Error XLS writer/reader

Hello

I have an issue with an XLS writer, I try to write a file, as many others, but it fails because of memory issue but I am pretty sure that is no memory issue.

It is only 3600 rows and 1400 columns, and if I filter most of the rows/columns, the issue is persistent. Also, I have 50go RAM allocated to KNIME, and the CSV writer works.

When I write the CSV it is 3 Go, but I do not understand why it is that heavy with such a little table size. A similar XLS file is 410 mo and works perfectly.

Any ideas ?

Cheers

Hi @gbaudry,

few questions:

  • what KNIME version are using?
  • are you writing xls or xlsx?
  • are you writing a new file or are you appending to an existing file?
  • could you share the workflow or at least the logs?

Best,
Simon

1 Like

Hi

Thanks, Here are my answers:

what KNIME version are using? 4.1.3.
are you writing xls or xlsx? xlsx (I use an older XLS writer as the new ones are less convinient for flow variables)
are you writing a new file or are you appending to an existing file? I tried both 
could you share the workflow or at least the logs? The workflow is a bit big, so I cannot, but here is the log error: ERROR Excel Sheet Appender (XLS) (deprecated) 4:12370:0:5112 Execute failed: ("OutOfMemoryError"): null

And it works once (to write it) but I cannot read it through XLS reader wheras I can open it through Excel:
WARN Excel Reader (XLS) (deprecated) 0:10157:0:7247 Specified file doesn’t exist (C:\Program Files (x86)\knime_4.1.3\missing)
WARN Variable to Table Row 0:10157:0:7736 No variables selected
ERROR Excel Reader (XLS) (deprecated) 0:10157:0:7247 Execute failed: java.lang.OutOfMemoryError

Thanks in advance for your help

Edit: tired the XLS writer / appender from 4.1.3., same issue: ERROR Excel Sheet Appender (XLS) 4:12370:0:5131 Execute failed: (“OutOfMemoryError”): null

Thanks for the answers, @gbaudry.
As you have 50GB allocated to KNIME, I would assume that the file you are writing is very huge. Are you able/allowed to share it?
Also, make sure to select xlsx as format that is written in the Excel Writer (give the file a name xyz.xlsx) (also the old writer is able to write is although it is named “XLS Writer”). This could reduce the memory that is required.

What exactly is less convenient? Happy to hear feedback.

Best,
Simon

Hi

I do not think this is a matter of size as when I put a column and row filter and only try to write a unique cell, the problem is still a memory problem. My guess is this is problem of memory management that I experienced before. It may be unrelated, but here is the thing: I have metanodes computing data in Java, and these nodes make the PC slower and slower, node after node, but if I put a python node in between two java ones, it smoothes everything and then I can put back Java which is back to instant computation again, utnil it is slow and so on… My feeling is that the memory management fails as when the code is slow it can take ages to compute even the basic math, while you move the exact same node upstream, the calculation is intsant.

The file as an input is 3Go in CSV. I now use the 4.1.3. XLS nodes but the problem remains, and yes I already name the file X.xls I can share the file but I am not sure it would help, and is is a bit massive to upload.
I have a similar file with the exact same size (row/columns/info type) that is only 8Mo, but the data is mostly integer, while the other one is mostly long double, is this what can cause such a gap in weight ?

Have you considered to employ a garbage collector, or a Cache node before writing the excel file. Another option might be to save the file in a table or parquet file and do the writing to excel in another workflow to save on Java heap space (if this indeed is the problem).

Also you could try to use the new columnar storage and force some nodes to write results to disk.

Then there are other ways to write an excel file, namely from within Python, of course not an elegant solution could just be to see if this could bring any improvement.

Hi @gbaudry,

that sounds weird and not expected! Can you try the following:

  1. Re-start KNIME
  2. Execute only the Excel Reader node that was failing without executing any other nodes before. (The best would be to put that reader node into a new and empty workflow before re-starting KNIME)

This should ensure the cache is empty. If the node still fails, the general memory management shouldn’t be the issue.

Best,
Simon

Hi Simon,

I did as you said, but still the same error.

Hi mlauber71,

Thanks for the tips, I will not be able to test your solution today, but I will try it as soon as I can and come back to you to report about the results.

Thanks to both

2 Likes

Are you able to share the file @gbaudry so that I can take a look at it?

Best,
Simon

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.