Good Day Knimers,
I am trying to read excel file sheets using Read Excel Sheet Names which is 82358KB ( 10 sheets there)
I am having an error Execute failed: Java heap space
Tried to increase Java Heap Space as it was written in here, but I ve the same error
What kind I do in order to solve this issue?
Good day @Karlygash !
On top of increasing the heap space using the entry “-Xmx” in the knime.ini file, the following actions may be taken too to improve the use of the Java heap space during execution:
- Set the Memory Policy of the nodes that are memory consuming to “Write Tables to Disc”:
- Use -Run Garbage Collector- or -Run Heavy Garbage Collector- node by @Vernalis before and/or after running the nodes that are memory consuming:
For instance, in the above example, I’m flushing within a loop the heap space at the end of every iteration within a loop, which forces to clean up the heap space before executing the next loop iteration.
I guess in your case you are reading the different sheets within a loop. You could then flush the memory after reading each Excel sheet and before reading next by setting the Garbage Collector node as shown in the snapshot above.
- KNIME is capable of running nodes in parallel, which means that they are clogging the heap space at the same time too. If you have several nodes running at the same time in your workflow, I would recommend to force them to run sequentially by connecting them with the red variable connections, for instances as follows to write several Excel files:
As shown in the snapshot above, the red variable connections force the nodes to write the excel sheets sequentially. Without these connections, they would have been run in parallel.
- Complementary information to these hints is available here too:
- Finally you can monitor “in real time” the use of your computer resources with the Windows task monitor (as far as I remember you use Windows), if you keep it open on a corner of your monitor while executing your workflow. It should give precious information about how the workflow is using the memory in real time as it goes.
Hope these hints help
Good luck @Karlygash !
thanks a lot , @aworker
Nice to see your support here=)
I have tried, seems like didnt follow ur hints properly)
Is that right way of following your instructions?))
I m thinking, mb I just read them one by one each sheet and then concatenate them?)
You did right but I didn’t expect the -Sheet Names- node to fail . The workflow, i.e. just one node, is failing before garbage collection so upstream it. I do not know how KNIME implements the -Sheet Names- node but if it needs to read the whole Excel file just to grab the sheet names, then this is bad news. I hope not.
Several possible solutions to this:
Do you know beforehand the name of the 10 Excel Sheets? Are they always the same? If so, can you replace the -Sheet Names- node by a table creator with already the names of the sheets and use it instead of the -Sheet Names- node?
Could you please do the following test for me ?: Try to read just one of the sheets using the -Excel Reader- node. Does it work? Does it work for all the sheets when read separately ?
For the time being, I’m I bit puzzled with the -Sheet Names- node but I’ll try to help based on the answer to these two questions.
Good luck @Karlygash !
Please save your workflows, close KNIME and open a new fresh KNIME. Most probably the crashing has produced a java heap memory corruption
Then try again the 2nd question test and let’s see …
it is reading, but after 7 th file my computer and knime are freezing))
@aworker , finally read all of them)
is that right?)
Yes, great @Karlygash, you are reading them manually one after each other
How much memory do you have in your computer ?
Do you have the Java Heap monitor enabled in your KNIME installation? If so, it should appear at the bottom of the KNIME window as follows:
If not, please go to File → Preferences → General and get it activated so that you can see how the Java Heap evolves at every step. You can even manually flush the Java Heap Memory manually by pressing on the dust bin.
Once it is installed, please close and open KNIME again to make sure it has a clean Java Heap again and do the following:
On the last workflow example you has posted, please add and connect a -Heavy Garbage Colector- node between every two Excel readers. Make sure too that the “Write Tables to disc” is checked in all those Excel readers:
save the workflow and finally please execute the last Excel Reader. What happens then ? Does it work too ?
This problem is reported in the documentation of the -Sheet Names- node:
which explicitly says:
The performance of the reader node is limited (due to the underlying library of the Apache POI project). Reading large files takes a very long time and uses a lot of memory (especially files in xlsx format).
It is a pitty … but let’s see if we can find together a solution to circumvent this drawback. I’ll have a look to alternatives while you test the last suggestion
@aworker , is that right?)
Yes it is messy but right
So this proves you can read all the Excel sheets if in between the readings you flush the Java Heap. If we come back to your first solution:
Could you please “for the time being” replace the -Sheet Names- node by a Table Creator with the names of your Excel file sheets and run it ? Again you may need to close and open KNIME to start with a fresh Java Heap. This will be just for testing. If this works, then I’ll try to find a way to replace -Sheet Names- node by another solution. Let’s see …
I am afraid that I did not properly understand,
can you look at this?
It sounds good to me. Please fill all the sheet names, one per row. Once it’s done and the node executed, you should get a table with the list of sheet names. When filling the table content, please double clic on the Column title (column 1 name) too to give the right name to the column (the same as the loop is expecting ;-))
@aworker dont know whether i m on right way)
Loop variable node
Table creator node
I believe everything is right in your settings but unfortunately you are facing again the Java Heap problem as shown in the snapshot you just sent me (highlighted in yellow)
Could you please save again your workflow, close and open Knime and just run the -Table Creator- node, the -Loop Start- node and the -Excel Reader- node (please do not run yet the -End Loop- node) to check that at least till there your configuration works ?
@aworker still have the same problem,
mb I m doing smt incorrectly?
I don’t know. How much RAM memory do you have in your computer?
@Karlygash, is this urgent or can it wait until tomorrow ?
If not urgent, I could implement a generic solution on my side and share it with you tomorrow. In the meanwhile I have found a very efficient way of extracting the sheet names without extra installations (just pure KNIME ) that I could integrate in my solution. It just needs polishing before I can share it here.
Is this ok with you ?
@aworker no, it is not urgent)) thank you so much
sure it is ok for me
thanks a lot!
My pleasure ! Good night then. Talk to you tomorrow !