Container Output (Table) execution fails with “Execute failed: Java heap space” when processing large tables.
I have a workflow that is attempting to send approximately 35 million rows to an external caller via the Container Output (Table) node, but the execution always fails with a Java heap space error. If I limit the number of rows using the Top k Selector, the process completes successfully. The Cache node has no problem handling this number of rows.
The problem is easy to reproduce by combining the Test Data Generator with the Container Output (Table) node, and generating a large number of rows.
I am using a MacBook Pro with 16GB RAM and -Xmx8g in knime.ini.
You could check out further options to speed up things with your data. And you might want to think about giving your KNIME more RAM, like 12 GB or so. But this very much might depend on the overall size of your data and what else is going on.
Maybe you could provide such a workflow with the instructions to create a sample file where you experienced this problem.
The Cache node produces a 1.35 GB data.zip file from the data in the workflow. As a workaround I am using the SQLite connector and DB Writer nodes to perform IO between workflows. This produces a SQLite file of 2.41 GB.
Adding more memory is not possible on my development machine (thanks Apple!). Increasing available RAM might well work in this scenario, but the Container Output (Table) node does appear to be the limiting factor here. All other nodes in the workflow can handle the volume of data, as can the DB Write and Cache nodes. I have tried both the default and columnar backends, but they make no difference.
This is an example workflow. On my machine, setting the number of rows produced by the Test Data Generator to 100,000 is sufficient to cause problems with the Container Output (Table) node.
I see the same problem when the Memory Policy of the Container Output (Table) node is set to “Write tables to disc”.