Memory/Heap space issues in version 4+

#1

KNIME V4.0.1
Windows 10, 64 bit
16G RAM
-xMx10240m

Hello folks,

Not sure if specific to me - or to the particular dataset that I’m processing - but I’m experiencing severe heap/memory related issues in some nodes - like column appender, column list loop and even round double. Feels like memory leak - memory is not released and after some time/couple of iterations it results in out of memory exception. Had to revert back to 3.7 and it works without issues now.
This is just FYI - would appreciate if you advise on workarounds as well.

Thanks and BR,

0 Likes

#2

Whoa. Sorry to hear that. Can you create a simple workflow to replicate the problem? Do you see the memory usage slowly growing up to the limit (using KNIME’s memory bar) or does it happen spontaneously?

Is that node in question the only node executing at that time or are there are parallel branches executing also? If you think you can replicate the problem: Does it help pausing the execution before that node for a brief amount of time? (Not that I suggest to do that always but it might help diagnosing the problem further).

Any further insights appreciated!

Thanks,
Bernd

1 Like

#3

Hey thank you for the quick response! Will try my best to create reproducible flow that doesn’t require all that data involved. Will send it over to you if I manage.

Cheers!

0 Likes

#4

And on your questions - no other processes running. Slowly building up. Only thing that helped is processing on small pieces and restarting KNIME after each batch. Same thing works just fine in 3.7, same ini settings, and in single chunk.

0 Likes

#5

Any news here?

Thanks,
Bernd

PS: At risk that I keep you from reproducing the problem and thus helping us to solve it: There is a FAQ that describes how to gradually revert KNIME memory policy to what it was in 3.7.x. (Then again, if you can give details on how to reproduce this will be appreciated a lot!!!)

0 Likes

#6

Hey Bernd,

Yes I saw this - and still will try to reproduce the issue to help you folks :slight_smile: What you are doing is great and I really want to support it. Will be able to spend some time on this on the Weekend. If I don’t manage with dummy workflow - will be able to share the original one (that caused the issue) with you after my Kaggle competition finishes (private code sharing is forbidden). That will be next week.

Cheers :slight_smile:

6 Likes

#7

Hey Bernd,

Hope this finds you well.
Managed to reproduce it - and prepared workflow to simulate it.
Basically what I noticed is that in the initial steps memory is consumed and not released (will send you screenshot) - then I have component (Deeplearning4j learner) that uses more memory - and it crashes there.
If editor is restarted to release memory - then learner finishes successfully. BUT later in the prediction phase (it is column loop that embeds prediction of word vectors for each column) memory utilization builds up again and results in memory error after a couple of iterations. Only way to release memory is to reset the GUI. That same functionality works on one pass without any issues in 3.7. Please let me know what is the best way to send you the exported wf, init file and the screenshots.

Thanks and BR,
Georgi Pamukov

2 Likes

#8

@gpamukov

great that you were able to reproduce the problem and I hope your Kaggle competition went well!

Either you just post your workflow etc. here, or you send a private message to Bernd or me - whatever suits you best (sent you a PM).

Looking forward to hearing from you
Mark

P.S. Could you share your knime.ini settings with us

1 Like

#9

Hey - sent as PM.

Thanks and BR,
G.Pamukov

2 Likes

#10

Hi Georgi,

I think what you are observing is not a memory leak. Instead, it is a consequence of a new table caching strategy introduced in KNIME Analytics Platform 4.0.0. This strategy attempts to keep the k least recently used tables in memory until some critical heap space allocation threshold is reached. By default, k is 32 and the critical memory threshold is 90% of the heap space available to KNIME minus 128 MB. Note that tables held in memory that way are asynchronously written to disk in the background such that the memory they block can be released when said threshold is reached.

What I think happens when you run out of memory is the following:

  1. The workflow runs smoothly for some time, tables are created and cached in memory. Memory consumption rises and tables are asynchronously written to disk in the background.
  2. Some memory-intensive node (Deeplearning4j learner maybe?) attempts to allocate some large amount of memory, which, generally, KNIME nodes shouldn’t / won’t do without providing some kind of fallback on memory low conditions. If this happens at some point in time where cached tables cannot be released from memory, for instance due to the asynchronous background writers lagging behind, you can, sadly, run into an OutOfMemoryError.

To resolve the issue, you can switch to a less memory-consuming table caching strategy by putting the line -Dknime.table.cache=SMALL into your knime.ini. This way, only very small tables will be held in memory. It will make your average KNIME workflow slower, but it’ll be less memory-consuming.

In an attempt to verify my assumption, I’ve run the workflow you kindly provided. Here’s what I observed:

  1. After starting up KNIME Analytics Platform 4.0.2 and opening the workflow, I ran a full-sweep garbage collection, upon which 122 MB heap space are blocked.
  2. I ran the workflow. It executed until the Word2Vec Learner Node, which crashed with these two not-so-helpful error messages:
    Execute failed: java.lang.ExceptionInInitializerError
    Execute failed: The Deeplearning4J Library could not be initialized. Maybe there is not enough memory available for DL4J. Please consider increasing the ‘Off Heap Memory Limit’ in the DL4J Prefernce Page.
    Unfortunately, the error messages persisted and did not get more verbose even after increasing the off-heap memory and checking the option to “Enable verbose logging”.
  3. Anyways, at this point I’m pretty deep into the workflow and 7.4 GB of my heap space are occupied. I ran another full-sweep garbage collection, upon which 6.8 GB heap space are still blocked. This is due to the least-recently-used tables being cached in memory and only released upon memory alert, even though they have probably been written to disk in the background already. Obviously, if I save the workflow at this point and re-open KNIME Analytics Platform, I start out fresh with 122 MB heap space consumption.
  4. However, instead of restarting KNIME Analytics Platform, I added 32 Data Generator nodes that generate 5400 rows of data each. I executed and then reset these nodes to flush KNIME’s table cache. I then did another full-sweep garbage collection and, voila, heap space is at 122 MB again, even though the relevant parts of the workflow is still executed up until the Word2Vec Learner Node.

I hope this helps to understand what’s happening. I’ll update this post if anything changes with regard to table caching strategies in KNIME Analytics Platform.

Best,

Marc

6 Likes