KNIME crashes with 6000x6000 table

Hi all,
I am new to KNIME so please excuse my ignorance. I run my workflow with different input data. When the dataset is large and I generate a 6000x6000 matrix KNIME closes without any error message, so I am not sure at what stage it actually crashes.

I cannot find any error in the knime.log or knime.log.old files.
I am running KNIME 3.7 on Ubuntu. I’d really appreciate your help!

If you open a terminal window and launch KNIME from the terminal window, so you see any informative information when it crashes?
If you change the “-Xmx…” value in your knime.ini to be 1.5x (e.g -Xmx2048m would become -Xmx3072m) does the crash still occur?

Thanks for the advice! I ran it from the console and set -Xmx to 8g. It seems that memory is indeed the problem. Is there any way to deal with this? I already set the nodes to write the tables to disc.

Here’s the console output:

(KNIME:11788): Gtk-WARNING **: 00:44:07.439: Negative content width -5 (allocation 1, extents 3x3) while allocating gadget (node progressbar, owner GtkProgressBar)

(KNIME:11788): Gtk-WARNING **: 00:44:07.439: Negative content width -2 (allocation 0, extents 1x1) while allocating gadget (node trough, owner GtkProgressBar)
WARN KNIME-Worker-3 NodeContainer Can’t continue loop as the workflow was restored with the loop being partially executed. Reset loop start and execute entire loop again.
WARN KNIME-Worker-1 NodeContainer Can’t continue loop as the workflow was restored with the loop being partially executed. Reset loop start and execute entire loop again.
ERROR KNIME-ConfigurationArea-Checker ConfigurationAreaChecker Can’t check integrity of configuration area ("/home/andt88/KNIME/knime_3.7.0.linux.gtk.x86_64/knime_3.7.0/configuration"): /home/andt88/KNIME/knime_3.7.0.linux.gtk.x86_64/knime_3.7.0/configuration/org.knime.core/root.lock
WARN KNIME-Worker-2 NodeContainer Can’t continue loop as the workflow was restored with the loop being partially executed. Reset loop start and execute entire loop again.
WARN main Node Cannot access ‘file:missing’: missing (No such file or directory)
Memory pressure relief: Total: res = 25661440/25391104/-270336, res+swap = 21114880/21114880/0
Memory pressure relief: Total: res = 25378816/25153536/-225280, res+swap = 20799488/20799488/0
Memory pressure relief: Total: res = 24989696/24989696/0, res+swap = 20410368/20410368/0
Memory pressure relief: Total: res = 24981504/24981504/0, res+swap = 20406272/20406272/0
Memory pressure relief: Total: res = 24977408/25014272/36864, res+swap = 20402176/20402176/0
Memory pressure relief: Total: res = 24989696/24977408/-12288, res+swap = 20414464/20414464/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/24977408/4096, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24973312/25006080/32768, res+swap = 20398080/20398080/0
Memory pressure relief: Total: res = 24989696/24989696/0, res+swap = 20414464/20414464/0
Java HotSpot™ 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000676a80000, 388497408, 0) failed; error=‘Cannot allocate memory’ (errno=12)

There is insufficient memory for the Java Runtime Environment to continue.

Native memory allocation (mmap) failed to map 388497408 bytes for committing reserved memory.

An error report file with more information is saved as:

/home/andt88/KNIME/knime_3.7.0.linux.gtk.x86_64/knime_3.7.0/hs_err_pid11788.log

Gtk-Message: 08:13:31.791: GtkDialog mapped without a transient parent. This is discouraged.

If this is not related to problems we are seeing with Loops in the latest KNIME release 3.7 you could check out the various hints concerning KNIME performance.

It could make sense to try and split the workflow up into several parts and see if that makes any difference. You could also try and monitor the Java Heap Space and see when that reaches a critical point.


KNIME performance

1 Like

In addition to mlauber71’s good suggestion to try to split up the amount of work being processed, could you say which nodes you are using and what sort of data you are working with? (36M integer cells would be a different beast than 36M molecule cells, for example.)

(Also, if mlauber71’s reference to loop problems is the Windows-freezing issue, then yes, this is not that.)

Thank you for your comments!
This is the part of my workflow where it crashes (executing both python scripts).
Annotation%202019-01-14%20002606

I realize that the branches run in parallel. Is there maybe a way to run them sequentially?

Hi andt88,

You can achieve that by connecting the output flow variable port of one of the Python nodes to the input flow variable port of the other. (Here, extra care must be taken in case the Python nodes read or write flow variables.)

Additionally, you could try to decrease the value of the “Rows per chunk” option of the Python nodes. The default value is only suitable for tables with fewer columns. Python nodes currently require all of their input data to be copied over to Python which is done in chunks. Lowering the corresponding option may reduce that load somewhat. In general, I’d recommend passing only those parts of the tables to the Python nodes that are really processed there and filter the rest, e.g., by using Column Filter and Row Filter nodes.

Marcel

4 Likes

Thanks for your help! I went from using the Schrödinger nodes to using RDkit nodes, because they run much faster. I also tried setting the Rows per chunk to 400k instead of 500k, but it had no effect. Not sure what an appropriate value should look like.
I am getting this error now:

ERROR Python Script (2⇒1) 0:107 Execute failed: An exception occured while running the Python kernel. See log for details.

2019-01-14 18:11:02,426 : DEBUG : Service Thread : MemoryAlertSystem : : : Memory usage below threshold (88%) after GC run, currently 22% (1.17GB/5.33GB)
2019-01-14 18:11:05,003 : DEBUG : KNIME-Worker-43 : AbstractTableStoreReader : Python Script (2⇒1) : 0:107 : Closing input stream on “/tmp/knime_Library_analysi80375/knime_container_20190114_7762296688026652672.bin.gz”, 0 remaining
2019-01-14 18:11:05,003 : DEBUG : KNIME-Worker-43 : AbstractTableStoreReader : Python Script (2⇒1) : 0:107 : Closing input stream on “/tmp/knime_Library_analysi80375/knime_container_20190114_7658822479022459887.bin.gz”, 0 remaining
2019-01-14 18:11:05,648 : DEBUG : KNIME-Worker-43 : Node : Python Script (2⇒1) : 0:107 : reset
2019-01-14 18:11:05,649 : ERROR : KNIME-Worker-43 : Node : Python Script (2⇒1) : 0:107 : Execute failed: An exception occured while running the Python kernel. See log for details.
2019-01-14 18:11:05,649 : DEBUG : KNIME-Worker-43 : Node : Python Script (2⇒1) : 0:107 : Execute failed: An exception occured while running the Python kernel. See log for details.
org.knime.python2.kernel.PythonIOException: An exception occured while running the Python kernel. See log for details.
at org.knime.python2.kernel.PythonKernel.getMostSpecificPythonKernelException(PythonKernel.java:1653)
at org.knime.python2.kernel.PythonKernel.putDataTable(PythonKernel.java:840)
at org.knime.python2.kernel.PythonKernel.putDataTable(PythonKernel.java:862)
at org.knime.python2.nodes.script2in1out.PythonScript2In1OutNodeModel.execute(PythonScript2In1OutNodeModel.java:89)
at org.knime.core.node.NodeModel.execute(NodeModel.java:733)
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:567)
at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1186)
at org.knime.core.node.Node.execute(Node.java:973)
at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:559)
at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)
at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:179)
at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:110)
at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:328)
at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:204)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
Caused by: org.knime.python2.extensions.serializationlibrary.SerializationException: An error occurred during serialization. See log for errors.
at org.knime.python2.serde.flatbuffers.Flatbuffers.tableToBytes(Flatbuffers.java:149)
at org.knime.python2.kernel.PythonKernel.putDataTable(PythonKernel.java:821)
… 16 more
Caused by: org.knime.python2.kernel.PythonExecutionException: FlatBuffers: cannot grow buffer beyond 2 gigabytes.
at org.knime.python2.util.PythonUtils$Misc.executeCancelable(PythonUtils.java:265)
at org.knime.python2.serde.flatbuffers.Flatbuffers.tableToBytes(Flatbuffers.java:146)
… 17 more
Caused by: java.lang.AssertionError: FlatBuffers: cannot grow buffer beyond 2 gigabytes.
at com.google.flatbuffers.FlatBufferBuilder.growByteBuffer(FlatBufferBuilder.java:133)
at com.google.flatbuffers.FlatBufferBuilder.prep(FlatBufferBuilder.java:179)
at com.google.flatbuffers.FlatBufferBuilder.startVector(FlatBufferBuilder.java:349)
at org.knime.python2.serde.flatbuffers.flatc.DoubleCollectionCell.createValueVector(DoubleCollectionCell.java:129)
at org.knime.python2.serde.flatbuffers.inserters.DoubleListInserter.createColumn(DoubleListInserter.java:95)
at org.knime.python2.serde.flatbuffers.Flatbuffers.tableToBytesInternal(Flatbuffers.java:275)
at org.knime.python2.serde.flatbuffers.Flatbuffers.lambda$0(Flatbuffers.java:146)
at org.knime.core.util.ThreadUtils$CallableWithContextImpl.callWithContext(ThreadUtils.java:344)
at org.knime.core.util.ThreadUtils$CallableWithContext.call(ThreadUtils.java:244)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-01-14 18:11:05,652 : DEBUG : KNIME-Worker-43 : WorkflowManager : Python Script (2⇒1) : 0:107 : Python Script (2⇒1) 0:107 doBeforePostExecution
2019-01-14 18:11:05,652 : DEBUG : KNIME-Worker-43 : NodeContainer : Python Script (2⇒1) : 0:107 : Python Script (2⇒1) 0:107 has new state: POSTEXECUTE
2019-01-14 18:11:05,652 : DEBUG : KNIME-Worker-43 : WorkflowManager : Python Script (2⇒1) : 0:107 : Python Script (2⇒1) 0:107 doAfterExecute - failure
2019-01-14 18:11:05,652 : DEBUG : KNIME-Worker-43 : Node : Python Script (2⇒1) : 0:107 : reset
2019-01-14 18:11:05,652 : DEBUG : KNIME-Worker-43 : Node : Python Script (2⇒1) : 0:107 : clean output ports.
2019-01-14 18:11:05,652 : DEBUG : KNIME-Worker-43 : NodeContainer : Python Script (2⇒1) : 0:107 : Python Script (2⇒1) 0:107 has new state: IDLE
2019-01-14 18:11:05,652 : DEBUG : KNIME-Worker-43 : Node : Python Script (2⇒1) : 0:107 : Configure succeeded. (Python Script (2⇒1))
2019-01-14 18:11:05,652 : DEBUG : KNIME-Worker-43 : NodeContainer : Python Script (2⇒1) : 0:107 : Python Script (2⇒1) 0:107 has new state: CONFIGURED
2019-01-14 18:11:05,653 : DEBUG : KNIME-Worker-43 : Node : RDKit Fingerprint : 0:111 : Configure succeeded. (RDKit Fingerprint)
2019-01-14 18:11:05,653 : DEBUG : KNIME-Worker-43 : NodeContainer : Python Script (2⇒1) : 0:107 : Library_analysis_RDKit 0 has new state: IDLE
2019-01-14 18:11:05,653 : DEBUG : KNIME-WFM-Parent-Notifier : NodeContainer : : : ROOT has new state: IDLE
2019-01-14 18:11:08,651 : DEBUG : Thread-127 : PythonMessaging : : : Python messaging system could not be shut down gracefully. Process will be killed.

I think the salient part of that error message is this:

Perhaps you could do the rows per chunk starting at a tiny number (like 1k) and work your way up until you hit this error message, then dial it back 10% - to get a feel for how big is too big?

2 Likes

Thanks for your comment!
Reducing rows per chunk to 5k results in another error, but only when executing from the workflow (the script runs fine from within the configuration window):

This is line 21 to which it is refering:
ax = sns.heatmap(data, cmap=‘coolwarm’)

ERROR ConfigurationAreaChecker Can’t check integrity of configuration area ("/home/andt88/KNIME/knime_3.7.0.linux.gtk.x86_64/knime_3.7.0/configuration"): /home/andt88/KNIME/knime_3.7.0.linux.gtk.x86_64/knime_3.7.0/configuration/org.knime.core/root.lock
ERROR Python Script (2⇒1) 0:107 Execute failed: Traceback (most recent call last):
File “/home/andt88/KNIME/knime_3.7.0.linux.gtk.x86_64/knime_3.7.0/plugins/org.knime.python2_3.7.0.v201811301307/py/PythonKernelBase.py”, line 278, in execute
exec(source_code, self._exec_env, self._exec_env)
File “”, line 21, in
File “/home/andt88/anaconda3/envs/py3.6_knime/lib/python3.6/site-packages/seaborn/matrix.py”, line 528, in heatmap
plotter.plot(ax, cbar_ax, kwargs)
File “/home/andt88/anaconda3/envs/py3.6_knime/lib/python3.6/site-packages/seaborn/matrix.py”, line 284, in plot
cmap=self.cmap, **kws)
File “/home/andt88/anaconda3/envs/py3.6_knime/lib/python3.6/site-packages/matplotlib/init.py”, line 1810, in inner
return func(ax, *args, **kwargs)
File “/home/andt88/anaconda3/envs/py3.6_knime/lib/python3.6/site-packages/matplotlib/axes/_axes.py”, line 5982, in pcolormesh
X, Y, C = self._pcolorargs(‘pcolormesh’, *args, allmatch=allmatch)
File “/home/andt88/anaconda3/envs/py3.6_knime/lib/python3.6/site-packages/matplotlib/axes/_axes.py”, line 5529, in _pcolorargs
np.arange(numRows + 1))
File “/home/andt88/anaconda3/envs/py3.6_knime/lib/python3.6/site-packages/numpy/lib/function_base.py”, line 4060, in meshgrid
output = [x.copy() for x in output]
File “/home/andt88/anaconda3/envs/py3.6_knime/lib/python3.6/site-packages/numpy/lib/function_base.py”, line 4060, in
output = [x.copy() for x in output]
MemoryError

It appears like the Python process itself is running out of memory - but we’ve walked into an area that is not my expertise… @MarcelW ?

In the meantime, does it succeed without memory issue if you go down to 2.5k chunks?

Thanks for sticking with me! The memory error persists with lower chunk sizes.
I just came across the row limit option in the configuration window. This may explain why the script worked from there.

What happens if you filter the rows coming out of Node 108 (the Column Filter node which feeds the failing Python node) so that only 2500 rows are fed into the Python node in total (and adjust the table dimension input to the node) ?

If it still fails with a “MemoryError”, what happens if you write the rows to a flat file, and then use the flat file as input (plus the table dimension information) to run the Python script from command line - taking KNIME completely out of the picture. Does it still fail?

If it doesn’t fail in the first, there’s something amiss in the ‘chunking’ the Python node should be doing, but you can work around that by doing the chunking yourself with nodes.

If it fails in the first, but doesn’t fail in the second, it seems like somehow the Python process being launched by the Python node is being starved of memory (i have no idea how this would happen.)

If it fails for both, it seems like you’ll need to write your Python script differently, to use less memory.

Looks like Python is running out of memory. From what I read, matplotlib is not the best in terms of memory usage. I decided to reduce the similarity matrix I generated by every second row and column for now. Thanks a lot for your help!

1 Like

I’m having a similar issue I think. Did you ever find a resolution beyond a workaround?

No, sorry. I saw some people recommending Gnuplot rather than pyplot for these big data sets. Good luck!