Windows 10, 64 bit
I’m suffering from JAVA Heap exhaustion which appears to occur after a period of time when performing a series of operations. I can’t pin it down exactly as it sometimes occurs when performing Joins, Splits, Filters, and other table operations. The only common theme appears to be when the workflow has several branches which may execute at the same time, though I have also had a heap exhaustion when these branches are concatenated and there was a subsequent row splitting operation.
It does not appear as a total freeze (though the UI will freeze intermittently) but a lack of resources leading to very slow execution (see attached thread dump shows the worker thread still running). You can see the heap exhaustion and lack of garbage collection in the attached monitor charts. After an extended period of time KNIME tends to crash, once the GC can no longer free up memory.
Any help would be appreciated. I have tried to identify a workflow to demonstrate this, but lack a bit of time at the moment due to deadline pressure.
threaddump-1566563314274.txt (43.1 KB)
I don’t know about your specific situation, but as the heap exhaustion seems to appear after a period of time (therefore not constrained to a specific node or set of nodes), I’d suggest it might be a case to simplify/optimize your workflow. Here’s an interesting read on that, and I would also suggest streaming execution (here’s a video explaining how to do it).
Streaming execution doesn’t help with join nodes as they cannot be streamed. I have used streaming execution in other parts of the workflow, however, it doesn’t resolve the problem. Workflow is optimised (I’ve been doing this for 5+ years).
The workflow succeeds provided no more than one branch is executing at any one time. It fails when multiple branches are executing with nodes that take a long time to execute. This appears to be preventing effective garbage collection and memory release.
I also have this problem… the execution stops with a heap space error, I reset the node and save the workflow, quit and restart Knime, and presto: the workflow happily continues. It would be nice if the Garbage Collector could have the same effect as a Knime restart…
Now, somewhere in one of the Knime extensions there is a node that forces the GC to run. I still have not tried that one yet, I am curious if that will make a difference.
Edit: I meant this node: https://nodepit.com/node/com.vernalis.knime.misc.gc.node.heavygc.RunHeavyGCNodeFactory
If you want to monitor the KNIME installation and trigger the garbage collector manually you can also use VisualVM. This allows you to monitor the JVM running KNIME.
In my case manually triggering the garbage collector does not appear to resolve the situation once the heap has filled, and has limited (no) impact as a pre-emptive measure.
This topic somewhat rings a bell.
I had some nasty issues with leaks in the “GET Request” node a while ago (they are fixed by now, but back then simply running the GC would not have helped, as there were references which just did not get cleaned up).
I’ve also observed with other nodes that sometimes the memory consumption with long-running workflows steadily grows without being released, but I didn’t have time or energy to investigate this further.
This old thread here suggests that there might an issue with the Java Snippet node – are you probably using this node? Did you check what kind of objects consume the heap (it’s in one of VisualVM’s tabs).
Just my “you’re not alone”, without being able to present a solution. Sorry
@qqilihq interesting! I use Java Snippets (simple) nodes a lot. I will try to replace them with something else and see if it makes a difference.
As an alternative to using a node that forces the GC to run or invoking the GC externally, you can also go to File -> Preferences -> General and check the box to Show heap status. You should then get to see a heap status panel at the bottom right of your KNIME Analytics Platform. Clicking the trashcan icon will then perform a full GC sweep. It should look like this:
Regarding the reported heap space exhaustion / memory leak, I can give some background:
KNIME Analytics Platform 4.0 makes a lot more use of its assigned memory for caching recently accessed data in memory. However, data cached this way are only softly or weakly referenced, i.e., they will be made available for garbage collection well before memory becomes critical.
@DiaAzul, it sounds like what you are experiencing is some proper heap space exhaustion. KNIME Analytics Platform becomes less and less responsive because more and more time is being spent unsuccessfully attempting to collect garbage and less and less time is being spent doing actual work. Eventually, Analytics Platform runs out of memory. What I can say is that this should not be related to in-memory table caching for reasons outlined above. What I can also say is that the Joiner node can use up a lot of memory but will flush intermediate data to disk when some memory threshold is reached.
Since @qqilihq brought up this forum post that discusses a potential memory leak in the Java Snippet node, I attempted to reproduce the issue on my machine. I did not find a memory leak in the Java Snippet node however. I merely found memory to be blocked by log messages in the Console View if I set the Console View Log Level to INFO or lower. See my latest reply in that other forum post for more details.
Some suggestions you could try:
- You can try setting the Console View Log Level to WARN in File -> Preferences -> KNIME -> KNIME GUI and see if that helps.
- You can configure the memory-intensive nodes (I’m looking at you, Joiner node) to Write tables to disk in the Memory policy tab.
- From your knime.ini, you could remove the line -XX:+UseG1GC and insert the lines -Dknime.table.cache=SMALL, -Dorg.knime.container.cellsinmemory=100000, -Dknime.synchronous.io=true, and -Dknime.compress.io=GZIP. Your KNIME Analytics Platform 4.0.1 will then behave a lot more than your KNIME Analytics Platform 3.7.2. Consequently, it will be a lot slower, yet also a bit less liberal in terms of resource consumption.
If all of these steps do not help, could you provide me with a minimal workflow with which I can reproduce the issue? Alternatively, you can use tools such as VisualVM to generate heap dumps and compare heap dumps with one another. This way, you can pin down what exactly is clogging up your memory.
Thank you to everyone that has commented. It’s helpful to get insight and views from you all.
@marc-bux - Thanks for the configuration options, I may try them if the problem recurs. However, for no obvious reason, everything is now behaving as it should. I’ll keep monitoring/experimenting and if I can pin down the cause of the problem with heap exhaustion then I will post a workflow to demonstrate it.
Does anybody know what happened to the garbage collector nodes listed above? There is no longer a download link available from the nodepit (https://nodepit.com/node/com.vernalis.knime.misc.gc.node.heavygc.RunHeavyGCNodeFactory).
You can find it here o the KNIME hub
Thanks for bringing this to our attention, this is a bug and we’ll look into it.
@cybrkup I had a deep look into NodePit’s code and finally found the issue. You will see download links for @Vernalis nodes for all supported KNIME versions, again! As Philipp already mentioned, thank you very much for pointing us to this!
Thank you for the quick turn-around on the fix.
@qqilihq I stumbled upon this trying to use the GET request node to retrieve a large export for download. Do you know what controls the max memory the GET node can access?
For both, the GET Request and Palladian’s HTTP Retriever, the data will be kept in memory, so both are not really suited for downloading big binary files. For this, you could make use of the “Download” node which (I’d at least expect) streams the results directly to disk:
Hope this helps!
@qqilihq Thanks so much for the tip! I will give it a try!
Have a great day!
@qqilihq OK, apologies in advance. I’ve never seen the light blue port. Is there a way I can convert my GET request to something this node can use?
Worry not, finding the “compatible” nodes can be quite tedious
For your specific case, you’d most likely go for an “HTTP Connection” or “HTTPS Connection” node.
General ProTip™: NodePit allows you to look up other nodes which have the matching output port. Therefore, open the node’s description on NodePit (or just follow the link above), and scroll to the “Input Ports” resp. “Output Ports” section and click on the “Plug” icon.
This will then take you to a page which lists all nodes with an output port which can be connected.
@qqilihq Philipp! Thanks so much for the additional assistance and tip on how to find matching nodes. Hope you have great weekend!