I think what you are observing is not a memory leak. Instead, it is a consequence of a new table caching strategy introduced in KNIME Analytics Platform 4.0.0. This strategy attempts to keep the k least recently used tables in memory until some critical heap space allocation threshold is reached. By default, k is 32 and the critical memory threshold is 90% of the heap space available to KNIME minus 128 MB. Note that tables held in memory that way are asynchronously written to disk in the background such that the memory they block can be released when said threshold is reached.
What I think happens when you run out of memory is the following:
- The workflow runs smoothly for some time, tables are created and cached in memory. Memory consumption rises and tables are asynchronously written to disk in the background.
- Some memory-intensive node (Deeplearning4j learner maybe?) attempts to allocate some large amount of memory, which, generally, KNIME nodes shouldn’t / won’t do without providing some kind of fallback on memory low conditions. If this happens at some point in time where cached tables cannot be released from memory, for instance due to the asynchronous background writers lagging behind, you can, sadly, run into an OutOfMemoryError.
To resolve the issue, you can switch to a less memory-consuming table caching strategy by putting the line -Dknime.table.cache=SMALL into your knime.ini. This way, only very small tables will be held in memory. It will make your average KNIME workflow slower, but it’ll be less memory-consuming.
In an attempt to verify my assumption, I’ve run the workflow you kindly provided. Here’s what I observed:
- After starting up KNIME Analytics Platform 4.0.2 and opening the workflow, I ran a full-sweep garbage collection, upon which 122 MB heap space are blocked.
- I ran the workflow. It executed until the Word2Vec Learner Node, which crashed with these two not-so-helpful error messages:
Execute failed: java.lang.ExceptionInInitializerError
Execute failed: The Deeplearning4J Library could not be initialized. Maybe there is not enough memory available for DL4J. Please consider increasing the ‘Off Heap Memory Limit’ in the DL4J Prefernce Page.
Unfortunately, the error messages persisted and did not get more verbose even after increasing the off-heap memory and checking the option to “Enable verbose logging”.
- Anyways, at this point I’m pretty deep into the workflow and 7.4 GB of my heap space are occupied. I ran another full-sweep garbage collection, upon which 6.8 GB heap space are still blocked. This is due to the least-recently-used tables being cached in memory and only released upon memory alert, even though they have probably been written to disk in the background already. Obviously, if I save the workflow at this point and re-open KNIME Analytics Platform, I start out fresh with 122 MB heap space consumption.
- However, instead of restarting KNIME Analytics Platform, I added 32 Data Generator nodes that generate 5400 rows of data each. I executed and then reset these nodes to flush KNIME’s table cache. I then did another full-sweep garbage collection and, voila, heap space is at 122 MB again, even though the relevant parts of the workflow is still executed up until the Word2Vec Learner Node.
I hope this helps to understand what’s happening. I’ll update this post if anything changes with regard to table caching strategies in KNIME Analytics Platform.