Selecting GPU ID in DL Network Executor (Tensorflow)

Hi -

Is there a way to restrict Deep Learning network to execute on a specific GPU ID on a multi-GPU host with Linux OS? Currently, when I use the DL Network Executor (Tensorflow) on a host with 4 GPUs, the DL model uses all available GPUs (see nvidia-smi output below).

One can get around this, if using Python Scripting, by setting “environment variables” but is there an option to do this directly in the “DL Network Executor” Node?

nvidia-smi output (apologies about the format)
±----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37 Driver Version: 396.37 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE… On | 00000000:0D:00.0 Off | Off |
| N/A 31C P0 31W / 250W | 16056MiB / 16280MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla P100-PCIE… On | 00000000:13:00.0 Off | Off |
| N/A 27C P0 31W / 250W | 15485MiB / 16280MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla P100-PCIE… On | 00000000:8E:00.0 Off | Off |
| N/A 28C P0 31W / 250W | 15485MiB / 16280MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla P100-PCIE… On | 00000000:91:00.0 Off | Off |
| N/A 30C P0 32W / 250W | 15485MiB / 16280MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 11757 C …linux.x86_64_1.8.0.152-01/jre/bin/java 15475MiB |
| 1 11757 C …linux.x86_64_1.8.0.152-01/jre/bin/java 15475MiB |
| 2 11757 C …linux.x86_64_1.8.0.152-01/jre/bin/java 15475MiB |
| 3 11757 C …linux.x86_64_1.8.0.152-01/jre/bin/java 15475MiB |
±----------------------------------------------------------------------------+

Thanks,

Reddy

Hi Reddy,

Sadly, this is currently not possible but this is a planned feature that we will implement in the future.

Right now you could set the environment variable CUDA_VISIBLE_DEVICES before starting KNIME to limit the GPUs visible to KNIME (This should also work for non Python nodes):

$ CUDA_VISIBLE_DEVICES=0 ./knime

This won’t allow you to use different GPUs for different Nodes but at least prevents KNIME from using all GPUs.

Best
Benny

Benny -

My apologies for being dormant on this topic.

I think your proposed solution, i.e., setting the “CUDA_VISIBLE_DEVICES” prior to launching KNIME, is only a partial fix to the problem. As you mentioned, doing so one would lead to under utilizing multiple GPUs.

So for the time being, we are making using of “Python Scripting” to load Deep Learning Model(s) on multiple GPUs. We, naturally, pay the price of (de)serialization of KNIP ImgPlus objects to/from Python but gain the ability to utilize all GPUs on the host. BTW, thanks a lot for improving the (de)serializationof KNIP ImgPlus to Python.

Apologies in advance for hijacking the thread.

Another issue that we have noticed with DL Network Executor (Tensorflow) (version 3.7.x) is that the “GPU” memory is NOT freed up (equivalent to K.clear_session() in Python+Keras+TF) after the DL Network Executor (Tensorflow) node is executed. We, at least, haven’t found a direct way within KNIME to clear the DL model from GPU(s) memory. We have to quit/close the KNIME application to free up the GPU.

My (un)educated guess is that the Java API coverage of Tensorflow is minimal so hopefully things will improve in near future.

Would love to hear as and when both these issues are resolved in future releases of KNIME Analytics.

Best,

Reddy

We know this issue and are working on it. Sadly TensorFlow doesn’t support releasing GPU memory at all as stated here: https://www.tensorflow.org/guide/using_gpu#allowing_gpu_memory_growth

Note that we do not release memory, since that can lead to even worse memory fragmentation.

But we can set the allow_growth option to prevent TensorFlow from allocating all memory.

2 Likes