heap space problem - joiner


I am running into Java heap space problems using the Joiner node:
The first table consists of 5,199,173 lines with strings (length~30) as rowIDs. The RowIds are all unique.
I am trying to join a table of similar length (exactly the same to be precise) with the same IDs (also uniq).
I created the rowIDs from the first table using the RowID node. I had to uncheck “Ensure uniqueness” because of memory issues. The value counter is also not working because of memory issues. I am using my own version of the value counter that is expecting a previously sorted table.

by the way the sorting is working… :wink:

I get the following error message on the console:
ERROR Joiner Execute failed: GC overhead limit exceeded

Maybe in one of next versions you could take care of this???
Please let me know if you need further information…



I also get the following error message:
ERROR BufferFromFileIteratorVersion20 Errors while reading row 3767124 from file “knime_container_20100330_6945491841330114268.bin.gz”: GC overhead limit exceeded; Suppressing further warnings.
ERROR BufferFromFileIteratorVersion20 CODING PROBLEM OutOfMemoryError caught, implementation may only throw IOException.
ERROR Joiner Execute failed: GC overhead limit exceeded

What type of join are you doing? I ran into this doing an outer join.

We are currently rewriting the Joiner to overcome some limitation of the current implementation, including join key selection (currently one of the join keys needs to be the row ID column), composite keys, and output column selection. We have also revised the memory behavior completely, using java VM memory monitors and swapping out to disk only if needed. The current prototype has undergone some heavy stress test and is quite stable when it comes to large data.
That node is being developed as part of a node sponsorship. I’m pretty certain that it will be available in v2.2. For now you probably need to live with the current limitations. You can try to tweak the sorting buffer size parameter as describing in this thread, though. Hope this helps, Bernd