DL Python Network Learner could not load GPU drivers

Evan1388 · December 19, 2020, 12:32pm

Hi everybody,

After using ML capabilities in KNIME over the past years, I wanted to explore the options around Deep Learning. For that I wanted to use Keras and Tensorflow 2.

I downloaded the following example workflow for learning purposes: 02_Tensorflow2_Autoencoder_for_Fraud_Detection_Training

When opening the DL Python Network Learner node, I get the following output in the console (interesting part in bold):

2020-12-19 13:04:25.934827: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-12-19 13:04:27.252656: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-12-19 13:04:27.276916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:2d:00.0 name: GeForce RTX 2060 SUPER computeCapability: 7.5
coreClock: 1.665GHz coreCount: 34 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2020-12-19 13:04:27.277119: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-12-19 13:04:27.277696: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘cublas64_10.dll’; dlerror: cublas64_10.dll not found
2020-12-19 13:04:27.278189: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘cufft64_10.dll’; dlerror: cufft64_10.dll not found
2020-12-19 13:04:27.278664: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘curand64_10.dll’; dlerror: curand64_10.dll not found
2020-12-19 13:04:27.279152: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘cusolver64_10.dll’; dlerror: cusolver64_10.dll not found
2020-12-19 13:04:27.279597: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘cusparse64_10.dll’; dlerror: cusparse64_10.dll not found
2020-12-19 13:04:27.280110: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘cudnn64_7.dll’; dlerror: cudnn64_7.dll not found
2020-12-19 13:04:27.280260: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at Install TensorFlow with pip for how to download and setup the required libraries for your platform.
Skipping registering GPU devices…
2020-12-19 13:04:27.280687: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-12-19 13:04:27.288667: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1cd7490ca10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-19 13:04:27.288886: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-12-19 13:04:27.289061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-19 13:04:27.289165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]
WARNING:tensorflow:No training configuration found in save file, so the model was not compiled. Compile it manually.

When I run a test in the terminal within the same environment, the libraries are loaded correctly.

I am now wondering what I am missing.

Here is how I approached this so far:

Initially, I created a new environment via Anaconda Navigator and installed Keras and TensorFlow 2 manually. I also installed CUDA Toolkit and cuDNN and created the system variables.
After realising that the installed versions do not work with KNIME, I found the following manual and used the in-built functionality in KNIME to create the Python GPU environments with all the libraries. I also removed again the previous installed CUDA Toolkit, cuDNN and system variables as I understood from the manual that the python environment created by KNIME contains all that is needed.

It would be great if anybody has an idea about I am missing here.
Also, I am using KNIME Analytics Platform 4.3.

Cheers!
Evan

ScottF · December 23, 2020, 4:28pm

Hi @Evan1388 and welcome to the forum.

It’s not clear to me why the system can detect the CUDA drivers in one case but not another, so I’ve asked some of our deep learning specialists to chime in. Sorry for the trouble.

MarcelW · December 23, 2020, 10:04pm

Hi @Evan1388,

There have been some similar reports about failures to load/locate DLLs when using KNIME’s Python (deep learning) integrations on Windows, recently. In some of the cases, it helped to explicitly activate the used Conda environment via a script rather than letting KNIME do it for you.
How to do that is described under “Configure the KNIME Python Integration” - “Option 2: Manual” of our Python guide. (It is not described in the deep learning guide, unfortunately. But the steps are exactly the same, they just need to be carried out in the Python Deep Learning preferences instead of the Python ones).
It basically boils down to creating a .bat file that looks like this (with the correct paths and names filled in):

@SET PATH=<PATH_WHERE_YOU_INSTALLED_ANACONDA>\Scripts;%PATH%
@CALL activate <ENVIRONMENT_NAME> || ECHO Activating python environment failed
@python %*

and pointing to it via the “Manual” option in the preferences.

Hope this helps!

Marcel

Evan1388 · December 28, 2020, 2:42pm

Hi @MarcelW, @ScottF,

Thanks for the help!

This resolved indeed the issues with the libraries. Coincidentally I had also issues with some Python Script nodes where a library installed with PIP could not be imported although it was clearly there.
I created for all Python environments the batch file and it works now.

I am experiencing now some other issues with the DL Python Network Learner:

Epoch 1/2
2020-12-28 15:35:25.624866: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-12-28 15:35:25.811376: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.811780: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.811959: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.812113: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.812290: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.812468: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.813166: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.813300: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.813550: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.813729: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.813915: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.814113: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.814388: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.814579: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.814765: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.814915: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.815111: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.817586: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-28 15:35:25.817760: W tensorflow/stream_executor/stream.cc:2041] attempting to perform BLAS operation using StreamExecutor without BLAS support
Blas GEMM launch failed : a.shape=(300, 30), b.shape=(30, 40), m=300, n=40, k=30
[[node model/dense/MatMul (defined at :13) ]] [Op:__inference_train_function_1702]
Function call stack:
train_function

Traceback (most recent call last):
File “”, line 13, in
File “C:\Users\evang.conda\envs\py3_knime_tf2_3\lib\site-packages\tensorflow\python\keras\engine\training.py”, line 66, in _method_wrapper
return method(self, *args, **kwargs)
File “C:\Users\evang.conda\envs\py3_knime_tf2_3\lib\site-packages\tensorflow\python\keras\engine\training.py”, line 848, in fit
tmp_logs = train_function(iterator)
File “C:\Users\evang.conda\envs\py3_knime_tf2_3\lib\site-packages\tensorflow\python\eager\def_function.py”, line 580, in call
result = self._call(*args, **kwds)
File “C:\Users\evang.conda\envs\py3_knime_tf2_3\lib\site-packages\tensorflow\python\eager\def_function.py”, line 644, in _call
return self._stateless_fn(*args, **kwds)
File “C:\Users\evang.conda\envs\py3_knime_tf2_3\lib\site-packages\tensorflow\python\eager\function.py”, line 2420, in call
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File “C:\Users\evang.conda\envs\py3_knime_tf2_3\lib\site-packages\tensorflow\python\eager\function.py”, line 1665, in _filtered_call
self.captured_inputs)
File “C:\Users\evang.conda\envs\py3_knime_tf2_3\lib\site-packages\tensorflow\python\eager\function.py”, line 1746, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File “C:\Users\evang.conda\envs\py3_knime_tf2_3\lib\site-packages\tensorflow\python\eager\function.py”, line 598, in call
ctx=ctx)
File “C:\Users\evang.conda\envs\py3_knime_tf2_3\lib\site-packages\tensorflow\python\eager\execute.py”, line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(300, 30), b.shape=(30, 40), m=300, n=40, k=30
[[node model/dense/MatMul (defined at :13) ]] [Op:__inference_train_function_1702]

Function call stack:
train_function

I am not sure if that is related to KNIME but I didn’t change the example workflow yet. I would assume that the example workflow should work out of the box.

I will continue trying to figure out if that is KNIME related of if some local settings are causing this.

Cheers,
Evan

MarcelW · December 30, 2020, 5:48pm

Hi Evan,

That is great news! We will fix the batch-less option on our end as soon as possible.

Now that you employ the batch file, there should be no difference in how TensorFlow is used between KNIME and an ordinary Python script.
I have not come across this error myself, yet, but it reads like a GPU-specific problem. This post suggests it could be caused by a different process blocking the GPU. And this post specifically mentions your GPU series and suggests to pass TensorFlow some additional configuration when using these GPUs. (The CUDA/CuDNN versions mentioned in the post should match the ones installed by KNIME. At least on my machine, KNIME installed CUDA 10.1 and CuDNN 7.6.5. You can check that via conda list -n py3_knime_tf2_3 cudatoolkit and conda list -n py3_knime_tf2_3 cudnn, respectively.)

Marcel

Evan1388 · January 17, 2021, 9:43am

Hi Marcel,

Thank you for those pointers.
Indeed, after applying the solution from your second link,I was able to make it all work.

Cheers,
Evan

system · January 24, 2021, 9:43am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.