AutoML Possibly Causing System Lock-up

About two days ago a workflow (WF) I’ve run for weeks without issue now appears to cause KNIME to lock-up/freeze. I have made changes to the WF in the last few weeks, but no changes to the metanode where the freezing seems to occur and nothing seemingly major has changed immediately prior to the freezing. I can’t think of anything of note that I updated/changed on my computer and I don’t know what else to try for trouble-shooting, so here I am…

The part of the WF where things always seems to freeze is the metanode that has 8 AutoML modules in it (no changes to this node at all in at least a week). They all seem to go into an infinite loop and not even a force quit or trying to kill KNIME via command line works, so I have to reboot.

I’ve completely uninstalled/reinstalled KNIME. I also removed and re-added all the AutoML nodes from the website earlier today. The primary metanode that the 8 AutoMLs roll up to is a loop (see screenshots below). When I go through the loop step-by-step everything seems to run fine (I’ve tried manually doing that for 3 loops), but when I do the same thing with normal (non-step-by-step), it locks up in the first loop (try).

On a different community thread I read about Windows Defender causing problems, so I added (I think) exceptions to BitDefender to ignore the KNIME paths. My BitDefender controls WD, so not sure I can do anything further. And I’ve never had a problem with this before. Earlier today, I also uninstalled Anaconda as it seemed unnecessary and, well, who knows?! I did re-build the Python environments afterwards. Switching between Keras and TensorFlow doesn’t seem to impact anything.

I don’t know if it happens every time, but I have noticed a few times where Python shows as a process under KNIME in the Task Manager when the freezing occurs. I can’t force quit that either.

Attached are logs from after the reinstall earlier today. A1 and A2 are split for size. There are probably a few examples in there of locking up… any time there is a 10-ish minute break, there is a decent chance I had to reboot.

knimeA1.log (2.7 MB) knimeA2.log (1.4 MB)

KnimeB1… is where I think it locked up at one point and I rebooted.
knimeB.log (2.1 MB)

Log file from KNIME/.metadata (I renamed it from .log to log.log so it could upload):
Log.log (145.5 KB)

The highest level of the WF:

One step down into where I think the problem is:

Final step


Happy to answer any questions. And thanks for the help.

After my latest reboot, I deleted the logs and restarted KNIME, started the WF, and waited until it crashed. Here are those logs.

knime.log (1.6 MB) log.log (3.4 KB) nodeusage_3.0.json (44.3 KB) history_momentjs-date-formats.txt (12 Bytes) history_momentjs-date-formats.txt (12 Bytes) history_momentjs-time-formats.txt (10 Bytes) history_momentjs-zoned-date-time-formats.txt (14 Bytes) history_ASCIIfile.txt (516 Bytes)

Final bit of info… I restored a version of the WF from a week ago, long before any problems started, and it quickly fails as well.

@cybrkup your Log contains several warnings about running out of memory. And frankly if you run in parallel 8 (?) of such AutoML nodes which contain several complicated node operations it might not be that surprising that your machine would not handle this well.

I am not sure but do you use Windows in a 32 or 64 Bit configuration and how much RAM do you have allocated to KNIME?

I think one way forward would be to tweak your KNIME’s performance and maybe arrange your workflows in a different way. Also you could see if bringing the AutoML nodes in a sequential way might help. Also it could help to employ a garbage collection in-between and store some results to disk.

Yes, I noticed the memory-related statements as well. However, the design of this metanode in the WF hasn’t changed in weeks and it never had problems previously. The CPU would run at 100% a lot, but again, it never locked up; RAM would run max in the low 8 GB.

That said, I tried what you suggested. The Gargage Collectors ensure that the AutoML Nodes run sequently. The WF made it thorough a step-by-step loop fine so then I tried it full auto and it locked up on the 4th “AutoML Node” (see screenshot). I rebooted, changed that node to use the same values from the first AutoML Node (since that one had worked; I should have pointed out that each AutoML uses a different model) and reran the WF. It ran one loop and then locked up on the first AutoML Node (which had previously worked). In this configuration, the CPU ran between 20-30% until the lock-up; RAM stayed in 3.5-5.5-ish GB.

Here are my ini file (-Xmx10000m) values:
-startup
plugins/org.eclipse.equinox.launcher_1.5.700.v20200207-2156.jar
–launcher.library
plugins/org.eclipse.equinox.launcher.win32.win32.x86_64_1.1.1100.v20190907-0426
–launcher.defaultAction
openFile
-vm
plugins/org.knime.binary.jre.win32.x86_64_1.8.0.252-b09/jre/bin
-vmargs
-server
-Dsun.java2d.d3d=false
-Dosgi.classloader.lock=classname
-XX:+UnlockDiagnosticVMOptions
-XX:+UnsyncloadClass
-XX:+UseG1GC
-Dsun.net.client.defaultReadTimeout=0
-XX:CompileCommand=exclude,javax/swing/text/GlyphView,getBreakSpot
-Xmx10000m
-Dorg.eclipse.swt.browser.IEVersion=11001
-Dsun.awt.noerasebackground=true
-Dequinox.statechange.timeout=30000

Running in debug mode, I found exactly where the failure is happening. I’ve attached the last bit of the debug file. I can post the full file if necessary, but presume this will suffice.

Debug.txt (947.7 KB)

The Model to Cell is where this WF fails. Unfortunately, I have no clue what to do about that.

Running in debug mode and here is interesting info. After the last crash, I rebooted and reran the WF and it had a different failure, but the log is probably much more useful here because of the great detail.

knime.log (3.9 MB)

Have you tried to “shrink” the worflow and run only subsets of it. Just to see if it works.
Your workflow looks quite complicated and the metanodes probably hide most of what’s beneath
Is it possible to provide the workflow?
bR

1 Like

Yes, here you go, and thanks for taking a look…

Attached below is a simplified version of the WF. Running it without AutoML (replacing that output with dummy data) works fine. Once I add in AutoML, it inevitably locks after just a few minutes. Adding a few AutoMLs like the origional WF and joining the results results in a lock very quickly, like a slow memory leak multiplied(?).

Below is the WF and related files. To run this with the necessary input files, it will be necessary to create two paths:

This file goes in this path. It must be renamed to History.Table (we can’t upload csv, so I changed the extension to txt):
History.txt (2.1 KB)
C:\Documents\Trading\KNIME Data\Duel (TEST1-TEST2)\Train-1000\Val-1

The following three files go into this path:
C:\Documents\Trading\KNIME Data\Market Data\Prod

These actually need to have the extensions changed to .csv (I changed to .txt):
TEST2.txt (3.7 MB) TEST1.txt (3.7 MB) Comp-TEST1-TEST2.txt (3.6 MB)

The WF itself:
DT Debug 6.knwf (3.6 MB)

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.