Keras/TF GPU Help

Hi everyone,

I am trying to use a Keras network with my Geforce RTX 3060 12gb. Everything runs great in the Keras/CPU environment created in the python deep learning preferences with the “New Environment” button. However, when I create a Keras GPU environment, it works just the same as with the CPU. The GPU memory doesn’t budge, nor does the Cuda graph in my performance task manager tab.

I’m using KNIME 5.2.0 on Windows 11. I’ve changed the Conda environment to check many different combinations of package versions, especially those in the integration guide. The “New Environment” KNIME creates for GPU for me comes with Cuda 10.2 installed, and it works just like the CPU, but it does not come with cuDNN. I’ve tried 10.0 with cuDNN 7.6.5 (plus combinations of different versions of both for good measure).

With basically all combinations of versions I’ve tried in the GPU environment, I get the same errors as I will post in the log file. Although, when I change the batch size to 1, the Cuda graph starts moving and the GPU memory jumps and it appears to work (pretty slowly). When it errors, with batch sizes > 1, the memory and Cuda graph spike right before an error occurs after the node is busy for some time, and right before it should get to calculating.

knime.log (589.9 KB)

A lot of this stuff is beyond my knowledge, but I’m learning and any help getting this to work if possible would be great. Thanks

Here is a chunk of the log:

2023-12-29 03:03:52,984 : WARN  : ModalContext :  : Node : Keras Network Learner : 3:22 : Execute failed: An error occured during training of the Keras deep learning network. See log for details.
2023-12-29 03:04:34,053 : WARN  : Thread-410 :  : PythonKernel : Keras Network Learner : 3:22 : C:\Users\ejfer\anaconda3\envs\keras_py37_gpu\lib\site-packages\keras\engine\saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
2023-12-29 03:04:37,437 : ERROR : KNIME-Worker-0-Keras Network Learner 3:22 :  : DLKerasLearnerNodeModel : Keras Network Learner : 3:22 : Blas GEMM launch failed : a.shape=(16, 15), b.shape=(16, 3), m=15, n=3, k=16
	 [[{{node training/Adam/gradients/dense_7/MatMul_grad/MatMul_1}}]]
2023-12-29 03:04:37,438 : ERROR : KNIME-Worker-0-Keras Network Learner 3:22 :  : Node : Keras Network Learner : 3:22 : Execute failed: An error occured during training of the Keras deep learning network. See log for details.
java.lang.RuntimeException: An error occured during training of the Keras deep learning network. See log for details.
	at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.handleGeneralException(DLKerasLearnerNodeModel.java:751)
	at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.executeInternal(DLKerasLearnerNodeModel.java:721)
	at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.execute(DLKerasLearnerNodeModel.java:320)
	at org.knime.core.node.NodeModel.executeModel(NodeModel.java:588)
	at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1297)
	at org.knime.core.node.Node.execute(Node.java:1059)
	at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:595)
	at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:98)
	at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:201)
	at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:117)
	at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:367)
	at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:221)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
	at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
Caused by: org.knime.python2.kernel.PythonIOException: Blas GEMM launch failed : a.shape=(16, 15), b.shape=(16, 3), m=15, n=3, k=16
	 [[{{node training/Adam/gradients/dense_7/MatMul_grad/MatMul_1}}]]
	at org.knime.python2.kernel.messaging.AbstractTaskHandler.handleFailureMessage(AbstractTaskHandler.java:146)
	at org.knime.python2.kernel.messaging.AbstractTaskHandler.handle(AbstractTaskHandler.java:92)
	at org.knime.dl.python.core.DLPythonAbstractCommands$DLTrainingTask.runInternal(DLPythonAbstractCommands.java:931)
	at org.knime.core.util.ThreadUtils$CallableWithContextImpl.callWithContext(ThreadUtils.java:383)
	at org.knime.core.util.ThreadUtils$CallableWithContext.call(ThreadUtils.java:269)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
2023-12-29 03:07:40,179 : WARN  : ModalContext :  : Node : Keras Network Learner : 3:22 : Execute failed: An error occured during training of the Keras deep learning network. See log for details.
2023-12-29 03:08:25,745 : WARN  : Thread-412 :  : PythonKernel : Keras Network Learner : 3:22 : C:\Users\ejfer\anaconda3\envs\keras_py37_gpu\lib\site-packages\keras\engine\saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
2023-12-29 03:08:29,113 : ERROR : KNIME-Worker-0-Keras Network Learner 3:22 :  : DLKerasLearnerNodeModel : Keras Network Learner : 3:22 : Blas GEMM launch failed : a.shape=(16, 15), b.shape=(16, 3), m=15, n=3, k=16
	 [[{{node training/Adam/gradients/dense_7/MatMul_grad/MatMul_1}}]]
2023-12-29 03:08:29,114 : ERROR : KNIME-Worker-0-Keras Network Learner 3:22 :  : Node : Keras Network Learner : 3:22 : Execute failed: An error occured during training of the Keras deep learning network. See log for details.
java.lang.RuntimeException: An error occured during training of the Keras deep learning network. See log for details.
	at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.handleGeneralException(DLKerasLearnerNodeModel.java:751)
	at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.executeInternal(DLKerasLearnerNodeModel.java:721)
	at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.execute(DLKerasLearnerNodeModel.java:320)
	at org.knime.core.node.NodeModel.executeModel(NodeModel.java:588)
	at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1297)
	at org.knime.core.node.Node.execute(Node.java:1059)
	at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:595)
	at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:98)
	at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:201)
	at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:117)
	at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:367)
	at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:221)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
	at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
Caused by: org.knime.python2.kernel.PythonIOException: Blas GEMM launch failed : a.shape=(16, 15), b.shape=(16, 3), m=15, n=3, k=16
	 [[{{node training/Adam/gradients/dense_7/MatMul_grad/MatMul_1}}]]
	at org.knime.python2.kernel.messaging.AbstractTaskHandler.handleFailureMessage(AbstractTaskHandler.java:146)
	at org.knime.python2.kernel.messaging.AbstractTaskHandler.handle(AbstractTaskHandler.java:92)
	at org.knime.dl.python.core.DLPythonAbstractCommands$DLTrainingTask.runInternal(DLPythonAbstractCommands.java:931)
	at org.knime.core.util.ThreadUtils$CallableWithContextImpl.callWithContext(ThreadUtils.java:383)
	at org.knime.core.util.ThreadUtils$CallableWithContext.call(ThreadUtils.java:269)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
2023-12-29 03:13:26,297 : WARN  : ModalContext :  : Node : Keras Network Learner : 3:22 : Execute failed: An error occured during training of the Keras deep learning network. See log for details.
2023-12-29 03:14:07,694 : WARN  : Thread-410 :  : PythonKernel : Keras Network Learner : 3:22 : C:\Users\ejfer\anaconda3\envs\keras_py37_gpu\lib\site-packages\keras\engine\saving.py:341: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
2023-12-29 03:14:11,083 : ERROR : KNIME-Worker-0-Keras Network Learner 3:22 :  : DLKerasLearnerNodeModel : Keras Network Learner : 3:22 : Blas GEMM launch failed : a.shape=(16, 15), b.shape=(16, 3), m=15, n=3, k=16
	 [[{{node training/Adam/gradients/dense_7/MatMul_grad/MatMul_1}}]]
2023-12-29 03:14:11,085 : ERROR : KNIME-Worker-0-Keras Network Learner 3:22 :  : Node : Keras Network Learner : 3:22 : Execute failed: An error occured during training of the Keras deep learning network. See log for details.
java.lang.RuntimeException: An error occured during training of the Keras deep learning network. See log for details.
	at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.handleGeneralException(DLKerasLearnerNodeModel.java:751)
	at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.executeInternal(DLKerasLearnerNodeModel.java:721)
	at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.execute(DLKerasLearnerNodeModel.java:320)
	at org.knime.core.node.NodeModel.executeModel(NodeModel.java:588)
	at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1297)
	at org.knime.core.node.Node.execute(Node.java:1059)
	at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:595)
	at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:98)
	at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:201)
	at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:117)
	at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:367)
	at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:221)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
	at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
Caused by: org.knime.python2.kernel.PythonIOException: Blas GEMM launch failed : a.shape=(16, 15), b.shape=(16, 3), m=15, n=3, k=16
	 [[{{node training/Adam/gradients/dense_7/MatMul_grad/MatMul_1}}]]
	at org.knime.python2.kernel.messaging.AbstractTaskHandler.handleFailureMessage(AbstractTaskHandler.java:146)
	at org.knime.python2.kernel.messaging.AbstractTaskHandler.handle(AbstractTaskHandler.java:92)
	at org.knime.dl.python.core.DLPythonAbstractCommands$DLTrainingTask.runInternal(DLPythonAbstractCommands.java:931)
	at org.knime.core.util.ThreadUtils$CallableWithContextImpl.callWithContext(ThreadUtils.java:383)
	at org.knime.core.util.ThreadUtils$CallableWithContext.call(ThreadUtils.java:269)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
2023-12-29 03:33:58,088 : WARN  : ModalContext :  : Node : Keras Network Learner : 3:22 : Execute failed: An error occured during training of the Keras deep learning network. See log for details.
2023-12-29 03:51:07,800 : WARN  : ModalContext :  : Node : Keras Network Learner : 3:22 : Execute failed: An error occured during training of the Keras deep learning network. See log for details.
2023-12-29 03:53:08,063 : WARN  : ModalContext :  : Node : Keras Network Learner : 3:22 : Execute failed: An error occured during training of the Keras deep learning network. See log for details.
2023-12-29 03:54:17,640 : WARN  : Thread-425 :  : PythonKernel : Keras Network Learner : 3:22 : C:\Users\ejfer\anaconda3\envs\cudakeras\lib\site-packages\keras\engine\saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
2023-12-29 03:54:21,703 : ERROR : KNIME-Worker-0-Keras Network Learner 3:22 :  : DLKerasLearnerNodeModel : Keras Network Learner : 3:22 : Blas GEMM launch failed : a.shape=(16, 15), b.shape=(16, 3), m=15, n=3, k=16
	 [[{{node training/Adam/gradients/dense_7/MatMul_grad/MatMul_1}} = MatMul[T=DT_FLOAT, _class=["loc:@training/Adam/gradients/dense_7/MatMul_grad/MatMul"], transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dense_7/Reshape, training/Adam/gradients/dense_7/Reshape_2_grad/Reshape)]]
2023-12-29 03:54:21,704 : ERROR : KNIME-Worker-0-Keras Network Learner 3:22 :  : Node : Keras Network Learner : 3:22 : Execute failed: An error occured during training of the Keras deep learning network. See log for details.
java.lang.RuntimeException: An error occured during training of the Keras deep learning network. See log for details.
	at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.handleGeneralException(DLKerasLearnerNodeModel.java:751)
	at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.executeInternal(DLKerasLearnerNodeModel.java:721)
	at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.execute(DLKerasLearnerNodeModel.java:320)
	at org.knime.core.node.NodeModel.executeModel(NodeModel.java:588)
	at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1297)
	at org.knime.core.node.Node.execute(Node.java:1059)
	at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:595)
	at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:98)
	at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:201)
	at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:117)
	at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:367)
	at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:221)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
	at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
Caused by: org.knime.python2.kernel.PythonIOException: Blas GEMM launch failed : a.shape=(16, 15), b.shape=(16, 3), m=15, n=3, k=16
	 [[{{node training/Adam/gradients/dense_7/MatMul_grad/MatMul_1}} = MatMul[T=DT_FLOAT, _class=["loc:@training/Adam/gradients/dense_7/MatMul_grad/MatMul"], transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dense_7/Reshape, training/Adam/gradients/dense_7/Reshape_2_grad/Reshape)]]
	at org.knime.python2.kernel.messaging.AbstractTaskHandler.handleFailureMessage(AbstractTaskHandler.java:146)
	at org.knime.python2.kernel.messaging.AbstractTaskHandler.handle(AbstractTaskHandler.java:92)
	at org.knime.dl.python.core.DLPythonAbstractCommands$DLTrainingTask.runInternal(DLPythonAbstractCommands.java:931)
	at org.knime.core.util.ThreadUtils$CallableWithContextImpl.callWithContext(ThreadUtils.java:383)
	at org.knime.core.util.ThreadUtils$CallableWithContext.call(ThreadUtils.java:269)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

@ejferrell your log does contain a lot of error messages some refering to the way the tasks seems to be set up. Maybe you start by trying to get a simpler example up and running

Make sure to check out the links at the end “Codeless Deep Learning” namely:

Codeless Deep Learning with KNIME
https://www.knime.com/codeless-deep-learning-book

There also is a space with examples:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.