Auto-save makes KNIME hang

Hi,

I am working with knime 3.6.1 on a win7 pc with 72GB RAM, and in the past days when KNIME tries to auto-save workflows it stops responding until I kill the process. I uninstalled and re-installed the base version of knime, and manually installed only the extensions I use, but I am still experiencing the same problem.

my knime.ini is

-startup
plugins/org.eclipse.equinox.launcher_1.4.0.v20161219-1356.jar
–launcher.library
plugins/org.eclipse.equinox.launcher.win32.win32.x86_64_1.1.551.v20171108-1834
–launcher.defaultAction
openFile
-vm
plugins/org.knime.binary.jre.win32.x86_64_1.8.0.152-01/jre/bin
-vmargs
-server
-Dsun.java2d.d3d=false
-Dosgi.classloader.lock=classname
-XX:+UnlockDiagnosticVMOptions
-XX:+UnsyncloadClass
-Dsun.net.client.defaultReadTimeout=0
-XX:CompileCommand=exclude,javax/swing/text/GlyphView,getBreakSpot
-Xmx36826m
-Dorg.eclipse.swt.browser.IEVersion=10001
-Dsun.awt.noerasebackground=true
-Dequinox.statechange.timeout=60000
20181023_log.txt (83.2 KB)

Please find also attached today’s portion of the log file.

Thanks!
Sonia

Hi Sonia,

We’re currently looking into it and I’ll get back to you as soon as I have further questions or some information to share. Thanks for attaching the log to the post!

Best,

Marc

Thanks Marc, I’ll wait for some news from you!

Hi Sonia,

The messages / exceptions in your log don’t seem to be associated with KNIME freezing while saving a workflow. If I’m not mistaken, the log you’ve attached is the Eclipse log at $KNIME_WORKSPACE/.metadata/.log. Could you also attach the KNIME log (or parts thereof) at $KNIME_WORKSPACE/.metadata/knime/knime.log (also accessible from within KNIME at View -> Open KNIME log)? It could be helpful if you set the verbosity of the KNIME log to DEBUG (in Preferences -> KNIME).

Also, I have several further questions that could help to isolate the problem:

  1. Do you run into the issue only when KNIME auto-saves a workflow or also when you save it manually?
  2. Do you have the “Save with data” option checked in Preferences -> KNIME -> KNIME GUI?
  3. Does KNIME always freeze during auto-saves or only sometimes / for some workflows?
  4. If KNIME only freezes for some workflows, is there anything these workflows have in common, e.g., do workflows that freeze during auto-save contain specific nodes such as TensorFlow nodes?
  5. If KNIME only freezes for some workflows, could you provide me with a small example workflow that freezes during auto-save on your system?
  6. Do you have a JDK installed? If so, could you provide me with a thread dump at the time KNIME hangs?

Thanks in advance,

Marc

Thanks for looking into this Marc.

Yes you are right, I attached the Eclipse log, now attached are the KNIME log for the most recent session, and an example workflow&data that had troubles in being saved.

Regarding your questions:

  1. the same issue appeared when both manually- and auto-saving. However, after searching for a solution in the forum I added the line -Dorg.knime.container.cellsinmemory=10000000 at the end of the ini file, which seemed to partially solve the problem: knime doesn’t respond for about 30mins, but it finally successively saves.
  2. the “save with data” option is not ticked
    3-5. I observed the issue only with the workflow that I was using in that moment, but I didn’t test it much further. I was able to partially replicate it in the small example workflow attached, and the freeze seems to happen when saving the cross joiner node. I said partially because I created this example after adding the last row in the ini file.
    it might be worth specifying that the workflow where I first experienced the issue is an old one which used to be saved with no problem in the past with similar data to those in the knar file .
  3. JDK is not installed.

Please do let me know if you need any more info.

Thanks!
Sonia

knime.zip (492.4 KB)

Hi Sonia,

Unfortunately, I was not able to reproduce the issue with the workflow you provided me with. Even if I set KNIME to auto-save the workflow every second, KNIME doesn’t hang on my local machine. It does however spend a fairly long time in the Cross Joiner node, since that nodes creates a table with ~30 billion cells. In any case, auto-saving without workflow data is a fairly cheap operation, so it shouldn’t cause freezes in the Analytics Platform.

The parameter you changed (-Dorg.knime.container.cellsinmemory) affects at what table size (measured in number of cells) KNIME materializes intermediate workflow data on the local hard disk. The default value is 100,000 and increasing that value to 10,000,000 will, in your case, possibly allow the Cross Joiner node to read its input tables from memory as opposed to from disk. This should make it run much faster and could be the reason why your installation does not freeze with that setting adjusted. (On a sidenote, you’ll get the same behavior by configuring the nodes that feed their output to the Cross Joiner node to use memory policy “Keep all in memory”.)

In any case, to further track down the cause of the freeze, I’d need a Java thread dump taken during the freeze of your KNIME installation. There are several ways of taking a thread dump in Java, most of which unfortunately require the installation of a JDK or third-party tool. If you are working on Linux or Mac, you might already have jstack installed (try “jstack” on the console).

Best,

Marc

Hi Marc,

Apologies for my late reply.

Today I opened again the same workflow, and it keeps getting stuck during autosave of the cross-joiner (it’s been going for ~1h and the autosave bar shows about 10%!).

I installed JDK but all the instructions I found to obtain a thread dump were based on jobs run on the console, would you mind giving me instructions on how to get one?

Thanks,
Sonia

Hi Sonia,

Of course. It depends on what operating system you are on though.

First, you have to determine the process Id of your KNIME Analytics platform. On Windows, you can do so via the Task Manager. On Windows 10 in the Task Manager, click “More details” and then click on the “Details” tab. In the list of processes, locate knime.exe and take note of its process Id (PID). On Linux (and possibly also an Mac), you can run “ps” from the command line to get a list of current processes, ordered by PID.

Then, from the command line, run jstack from the command line with the PID as parameter (e.g., “jstack 11744” or “jstack.exe 11744” if 11744 were KNIME’s PID). JStack will print the thread dump on the command line. Please let me know if you need more input.

Kind regards,

Marc

Thanks Marc,

Here is the thread dump, hope it helps in understanding what’s going on!

Sonia

threadDump.txt (82.6 KB)

Well, at the time the thread dump was taken, no IO threads were actually running. Furthermore, there are no deadlocks or any other obvious problems going on. The thread saving the workflow is currently busy copying a table. This is somewhat strange for two reasons: (1) after 1 hour of waiting it should probably be done copying all tables and (2) you mentioned earlier that you had the checkbox “Save with data” unchecked, so the auto-save should not actually copy any data. It would be interesting to know if the thread is actually stuck or still doing something meaningful. To find out, could you provide two thread dumps, one taken a couple seconds after the other?

May I ask what auto-save interval you have set in Preferences -> KNIME -> KNIME GUI? Could you also share with me again the small example workflow with which you could reproduce the problem?

As a work-around for now, if you have sufficient main memory, you could configure the problematic Cross Joiner node to “keep all tables in memory” in it’s node configuration dialog.

I suspect the tables internally created by the Cross Joiner node somehow interfere with the auto-saving of workflows. I’ve filed a ticket in our development ticketing system. We’ll discuss this issue and see if we can get to the bottom of this.

1 Like