Knime 3.5.x Crash with DL4J integration

Hello, i work since a month with knime and the DL4J integration.
The Workflow that i have created for my data (image classification) is comparable with the celebrity demo.

When the workflow is in execution and it comes to the DL4J Feedforward Lerner (or Predictor) a Crash occurs and Knime closes without a messages. This Error happens also when i run the celebtrity demo.

I use a Windows 10 Laptop with 12 gb ddr memory and an Intel i5 processor to execute the Workflows. Also I did run the Workflow on diffrent Version of Knime (3.5.3 and 3.5.2), on both the crash occurs.After two weeks of testing I found out that the error in Version 3.4.2 not appear.

Here is one Logfile of knime after a crash: hs_err_pid10516.log (44.2 KB) (Knime 3.5.3)

1 Like

Hi ralph,

sorry for your troubles. The problem is most likely related to conflicting .dll files on your system path. Do you have Anaconda Python installed? Anaconda adds some .dll files (namely anything containing mkl) to your system path. The DL4J library also ships those .dll files, so if both of them can be found at the same time the JVM will crash. You could try to remove all Anaconda entries from your system path and try again. However, then the Anaconda commands will not be able to called from the command line anymore. Currently, that’s the only solution I know.

Cheers
David

Hi Dave,

thank you for the response! I forgot to mention that I know about that error, witch is noted for knime (3.2) at the bottom of https://www.knime.com/deeplearning4j. In the first two weeks i started with knime 3.5.3 on a fresh new windows 10, without anaconda. In that time, Knime crashed very randomly when the workflow execution comes to the mentiond nodes.
Because of thinking about a switch to R + Kearas, i did install anaconda. After that installation i think that the crashes comes more frequent and also in the demos.

As a last Try with knime and after i did saw in an annotation in the demo celebrity workflow following note :
“Workflow Requirements
KNIME Analytics Platform 3.4.0
KNIME Deeplearning4J Integration
KNIME Image Processing - Deeplearning4J Integration”
I did switcht to the Knime 3.4.2 Version. Since then i have no crashes noted.

ralph

Hi there,

I’m still working with knime and the dl4j integration. Because I got in knime 3.4.2 on other windows machines (without Anaconda) also Chrashes, I give the newest Version (3.6) a try.

In this version the error is very randomly. Yesterday (20.08.2018), I had no problem with the Workflow, i could it run multiple times without a crash. Today, in 4 executions of the workflow, knime (3.6) chrashed every time when it comes to the learner. In all 4 log files which i got, is always openblas the last javaFrame.
hs_err_pid9760.log (45.1 KB)
hs_err_pid6928.log (46.9 KB)
hs_err_pid8884.log (46.7 KB)
hs_err_pid10408.log (45.2 KB)

PATH=C:\Program Files (x86)\Common Files\Oracle\Java\javapath;
C:\ProgramData\Oracle\Java\javapath;
C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;
C:\Windows\System32\WindowsPowerShell\v1.0\;
C:\Program Files\Java\jdk1.8.0_161\bin;
C:\Program Files\Novell\iPrint;C:\Program Files\PuTTY\;
C:\Program Files\Git\cmd;C:\Program Files (x86)\Pandoc\;
C:\Program Files\Microsoft VS Code\bin;
C:\Program Files (x86)\Novell\GroupWise;
C:\Users\Ralph Isenmann\AppData\Local\Microsoft\WindowsApps;
C:\Program Files\Microsoft VS Code\bin

In my system path is no anaconda, so this should not be the problem, or does anconda install dlls to existings system paths ?

In addition, I saw that a newer Version of dl4j exists in there repository https://deeplearning4j.org/release-notes#onezerozerobeta2 with the release-note

ND4J: nd4j-native-platform will now use Intel MKL-DNN as the default/bundled BLAS implementation (replacing OpenBLAS as the previous default)

Maybe this could help, to get rid of this error. Is there a way to get or update the dl4j Integration package?

ralph

Hi ralph,

I’m sorry you’re still experiencing these kinds of problems. It seems from your description that there are further problems and not only the issues with Anaconda. Unfortunately, removing the Anaconda paths from your system path is the only solution for these crashes I currently know of. The DL4J version which is used in KNIME is rather old, hence as you pointed out, an update may indeed solve these problems. However, I can give you no time estimate when or if this will happen.

You mentioned that it suddenly stopped working from one day to the other. Is there something you changed/installed in between that might affect this?

I don’t think Anaconda will add files to other locations, but you can look for MKL related files on your path with the command where *mkl*. That will list all files on your path that contain the String “mkl”. If there are some, you can try to remove those.

Could switching your use case to the Keras Integration be an option?

Cheers
David

Hi Dave,

Thanks for your response!

The only change I made at that time was creating a new environment for anaconda.

The expected result in the command promt is right. I dont have a path to anaconda in the PATH var so I dont got any result for where *mkl*
grafik

I already switched to the Kears Integration and tried out some examples, but with this Plugin I get stuck with another error which is mentioned in this Thread keras backend error.
Is there a release date for the new Knime 3.6.1 version that fixes the bug?

ralph

Hi,

I think I did a big mistake in my Knime Setup. Because I tried both Deep Learning Integrations (DL4J and Keras) in paralell, I installed Anaconda and set it up for Keras how it is mentioned in this Guide.

That batch skript, which activates a python environment does also add path entries to the Anaconda Install directory, so it adds paths to the mentioned mkl libraries.

I now switched to the plain version of python with keras an tensorflow, and set that as an interpreter for knime. But this change didn’t helpt anyway.

Yep. Today :slight_smile: Let us know if it works as expected.

1 Like

The MKL is a big problem with our DL4J integration. I recommend using the DL Keras nodes. We hope with the bugfix release it now works as expected. Sorry for the trouble!

Christian