Server + Executor and Deep Learning (Keras/Tensorflow) - Python crashes

Hi,
I wanted use our KNIME Server during the weekends for calculating Keras Models.
In the first attempt I installed Anaconda3 and started Executor in GUI Mode and used the option to create the Env with KNIME. Result: A long error message, that some files are missing (not be found).

I then tried to execute the yml-files located in the and the errormsg above is gone. Now this message appears

Also tried to configure the system version of python, because someone mentioned Executor would use the system version.

After all, I deleted Anaconda3 and reinstalled it. Know the Keras Network Learner Node crashes with:

2020-07-07 15:31:47,891 : ERROR : KNIME-Worker-186-Keras Network Learner 2:10 : 2711bf82-73c3-4e73-8dd6-938a01e73ca4 : Node : Keras Network Learner : 2:10 : Execute failed: An error occurred while trying to launch Python: The external Python process crashed for unknown reasons while KNIME set up the Python environment. See log for details.
java.io.IOException: An error occurred while trying to launch Python: The external Python process crashed for unknown reasons while KNIME set up the Python environment. See log for details.
	at org.knime.dl.keras.base.portobjects.DLKerasUnmaterializedPortObjectContent.materialize(DLKerasUnmaterializedPortObjectContent.java:118)
	at org.knime.dl.keras.base.portobjects.DLKerasUnmaterializedNetworkPortObject.getNetwork(DLKerasUnmaterializedNetworkPortObject.java:123)
	at org.knime.dl.keras.base.portobjects.DLKerasUnmaterializedNetworkPortObject.getNetwork(DLKerasUnmaterializedNetworkPortObject.java:1)
	at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.executeInternal(DLKerasLearnerNodeModel.java:624)
	at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.execute(DLKerasLearnerNodeModel.java:303)
	at org.knime.core.node.NodeModel.executeModel(NodeModel.java:571)
	at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1236)
	at org.knime.core.node.Node.execute(Node.java:1016)
	at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:557)
	at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)
	at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:218)
	at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:124)
	at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:334)
	at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:210)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
	at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
Caused by: org.knime.dl.core.DLInvalidEnvironmentException: An error occurred while trying to launch Python: The external Python process crashed for unknown reasons while KNIME set up the Python environment. See log for details.
	at org.knime.dl.python.core.DLPythonDefaultContext.createKernel(DLPythonDefaultContext.java:102)
	at org.knime.dl.python.core.DLPythonDefaultContext.getKernel(DLPythonDefaultContext.java:124)
	at org.knime.dl.python.core.DLPythonDefaultContext.executeInKernel(DLPythonDefaultContext.java:170)
	at org.knime.dl.python.core.DLPythonAbstractCommands.getContext(DLPythonAbstractCommands.java:214)
	at org.knime.dl.keras.core.layers.DLKerasNetworkMaterializer.materialize(DLKerasNetworkMaterializer.java:181)
	at org.knime.dl.keras.base.portobjects.DLKerasUnmaterializedPortObjectContent.materialize(DLKerasUnmaterializedPortObjectContent.java:107)
	... 17 more
Caused by: org.knime.python2.kernel.PythonIOException: The external Python process crashed for unknown reasons while KNIME set up the Python environment. See log for details.
	at org.knime.python2.kernel.PythonKernel.<init>(PythonKernel.java:279)
	at org.knime.dl.python.core.DLPythonDefaultContext.createKernel(DLPythonDefaultContext.java:97)
	... 22 more
Caused by: java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: Accept timed out
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.knime.python2.kernel.PythonKernel.<init>(PythonKernel.java:273)
	... 23 more
Caused by: java.net.SocketTimeoutException: Accept timed out
	at java.net.PlainSocketImpl.socketAccept(Native Method)
	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
	at java.net.ServerSocket.implAccept(ServerSocket.java:545)
	at java.net.ServerSocket.accept(ServerSocket.java:513)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Hope someone can assist.

Thanks and br,
Sven

Hi Sven,

do you already know this step-by-step guide to setup the Python Environment within KNIME?

https://docs.knime.com/2019-12/python_installation_guide/index.html

I would suggest to set it up on the executor using a dummy workspace (not the Server Workflow Repository) and if everything works to export the preferences and add the python lines to the existing preferences.epf of the KNIME Server.

Cheers,
Michael

Hi Michael,

yes I followed the instructions.
Currently the Node tries to use Python2 and is not able to activate the py3_knime_dl anaconda3 envs.
I also edited the epf-template.

br,
Sven

Hi Sven,

did you save the epf-Template as preferences.epf within the folder /config folder already? After a restart of the server the executor should get the preferences assigned.

Cheers,
Michael

Hi @MichaelRespondek,

I exported the epf-File via Knime Executor in GUI-Mode.
EPF:

#Tue Jul 14 13:12:42 CEST 2020
!/=
/configuration/org.eclipse.core.net/org.eclipse.core.net.hasMigrated=true
/configuration/org.eclipse.ui.ide/MAX_RECENT_WORKSPACES=10
/configuration/org.eclipse.ui.ide/RECENT_WORKSPACES=/home/knime/knime-workspace
/configuration/org.eclipse.ui.ide/RECENT_WORKSPACES_PROTOCOL=3
/configuration/org.eclipse.ui.ide/SHOW_RECENT_WORKSPACES=false
/configuration/org.eclipse.ui.ide/SHOW_WORKSPACE_SELECTION_DIALOG=true
/instance/org.eclipse.ant.launching/timeout=20000
/instance/org.eclipse.ant.ui/useAnnotationsPrefPage=true
/instance/org.eclipse.ant.ui/useQuickDiffPrefPage=true
/instance/org.eclipse.core.net/org.eclipse.core.net.hasMigrated=true
/instance/org.eclipse.core.resources/version=1
/instance/org.eclipse.ui.browser/internalWebBrowserHistory=file:/tmp/intro3608044572026009383.html||file:/tmp/intro1287664273020468326.html||file:/tmp/intro7441375184556169770.html||file:/tmp/intro6732731675835378154.html||file:/tmp/intro3594187962095442978.html||file:/tmp/intro1704382483548284762.html||
/instance/org.eclipse.ui.workbench//org.eclipse.ui.commands/state/org.eclipse.ui.navigator.resources.nested.changeProjectPresentation/org.eclipse.ui.commands.radioState=false
/instance/org.knime.dl.python/condaDirectoryPath=/home/knime/anaconda3
/instance/org.knime.dl.python/condaEnvironmentName=py3_knime_dl
/instance/org.knime.dl.python/manualConfig=python3
/instance/org.knime.dl.python/pythonConfigSelection=dl
/instance/org.knime.dl.python/pythonEnvironmentType=conda
/instance/org.knime.dl.python/serializerId=org.knime.serialization.flatbuffers.column
/instance/org.knime.python2/condaDirectoryPath=/home/knime/anaconda3
/instance/org.knime.python2/defaultPythonOption=python3
/instance/org.knime.python2/python2CondaEnvironmentName=py2_knime
/instance/org.knime.python2/python2Path=python
/instance/org.knime.python2/python3CondaEnvironmentName=py3_knime
/instance/org.knime.python2/python3Path=python3
/instance/org.knime.python2/pythonEnvironmentType=conda
/instance/org.knime.python2/serializerId=org.knime.serialization.flatbuffers.column
/instance/org.knime.workbench.core/knime.askedToSendStatistics=true
/instance/org.knime.workbench.core/knime.sendAnonymousStatistics=true
/instance/org.knime.workbench.core/knime.workspace.version=20190627
/instance/org.knime.workbench.explorer.view/mountpointNode/EXAMPLES/active=true
/instance/org.knime.workbench.explorer.view/mountpointNode/EXAMPLES/defaultMountID=EXAMPLES
/instance/org.knime.workbench.explorer.view/mountpointNode/EXAMPLES/factoryID=com.knime.explorer.server.examples
/instance/org.knime.workbench.explorer.view/mountpointNode/EXAMPLES/mountID=EXAMPLES
/instance/org.knime.workbench.explorer.view/mountpointNode/EXAMPLES/mountpointNumber=2
/instance/org.knime.workbench.explorer.view/mountpointNode/EXAMPLES/useRest=false
/instance/org.knime.workbench.explorer.view/mountpointNode/LOCAL/active=true
/instance/org.knime.workbench.explorer.view/mountpointNode/LOCAL/defaultMountID=LOCAL
/instance/org.knime.workbench.explorer.view/mountpointNode/LOCAL/factoryID=org.knime.workbench.explorer.workspace
/instance/org.knime.workbench.explorer.view/mountpointNode/LOCAL/mountID=LOCAL
/instance/org.knime.workbench.explorer.view/mountpointNode/LOCAL/mountpointNumber=3
/instance/org.knime.workbench.explorer.view/mountpointNode/LOCAL/useRest=false
/instance/org.knime.workbench.explorer.view/mountpointNode/My-KNIME-Hub/active=true
/instance/org.knime.workbench.explorer.view/mountpointNode/My-KNIME-Hub/defaultMountID=My-KNIME-Hub
/instance/org.knime.workbench.explorer.view/mountpointNode/My-KNIME-Hub/factoryID=com.knime.explorer.server.workflow_hub
/instance/org.knime.workbench.explorer.view/mountpointNode/My-KNIME-Hub/mountID=My-KNIME-Hub
/instance/org.knime.workbench.explorer.view/mountpointNode/My-KNIME-Hub/mountpointNumber=1
/instance/org.knime.workbench.explorer.view/mountpointNode/My-KNIME-Hub/useRest=false
/instance/org.knime.workbench.explorer.view/mountpointNode/knime-srv/active=true
/instance/org.knime.workbench.explorer.view/mountpointNode/knime-srv/address=http://localhost:8080
/instance/org.knime.workbench.explorer.view/mountpointNode/knime-srv/authType=Credentials
/instance/org.knime.workbench.explorer.view/mountpointNode/knime-srv/defaultMountID=knime-srv
/instance/org.knime.workbench.explorer.view/mountpointNode/knime-srv/factoryID=com.knime.explorer.server
/instance/org.knime.workbench.explorer.view/mountpointNode/knime-srv/mountID=knime-srv
/instance/org.knime.workbench.explorer.view/mountpointNode/knime-srv/mountpointNumber=0
/instance/org.knime.workbench.explorer.view/mountpointNode/knime-srv/restPath=/knime/rest
/instance/org.knime.workbench.explorer.view/mountpointNode/knime-srv/useRest=true
/instance/org.knime.workbench.explorer.view/mountpointNode/knime-srv/user=knimeadm
/instance/org.knime.workbench.workflowcoach/community_node_triple_provider=true
@org.eclipse.ant.launching=1.2.1.v20171108-0853
@org.eclipse.ant.ui=3.7.0.v20170412-1054
@org.eclipse.core.net=1.3.100.v20170516-0820
@org.eclipse.core.resources=3.12.0.v20170417-1558
@org.eclipse.ui.browser=3.6.100.v20170418-1342
@org.eclipse.ui.workbench=3.110.1.v20170704-1208
@org.knime.dl.python=4.1.0.v201909231406
@org.knime.python2=4.1.0.v201911191126
@org.knime.workbench.core=4.1.0.v201912031642
@org.knime.workbench.explorer.view=8.5.3.v202005112254
@org.knime.workbench.workflowcoach=4.1.0.v201911110939
file_export_version=3.0

Error:

There are messages for workflow “03-Training 2020-07-14 13.30.32”
Keras Network Learner 2:10 - ERROR: Execute failed: An error occurred while creating the Keras network from its layer specifications.
This could be due to a version mismatch between Keras and TensorFlow.
Please make sure that Keras 2.1.6 and TensorFlow 1.8.0 are installed in your Python environment.
See log for details.
You can install the correct version of Keras and TensorFlow on the ‘Python Deep Learning’ preference page.

I used the GUI Mode of the Executor to create the envs.

br,
Sven

Executor-Log:

020-07-16 15:50:46,154 : ERROR : KNIME-Worker-14-Keras Network Learner 0:10 : dab5337d-7568-4a33-a220-9593233a6916 : Node : Keras Network Learner : 0:10 : Execute failed: An error occurred while creating the Keras network from its layer specifications.
This could be due to a version mismatch between Keras and TensorFlow.
Please make sure that Keras 2.1.6 and TensorFlow 1.8.0 are installed in your Python environment.
See log for details.
You can install the correct version of Keras and TensorFlow on the ‘Python Deep Learning’ preference page.
java.io.IOException: An error occurred while creating the Keras network from its layer specifications.
This could be due to a version mismatch between Keras and TensorFlow.
Please make sure that Keras 2.1.6 and TensorFlow 1.8.0 are installed in your Python environment.
See log for details.
You can install the correct version of Keras and TensorFlow on the ‘Python Deep Learning’ preference page.
at org.knime.dl.keras.base.portobjects.DLKerasUnmaterializedPortObjectContent.materialize(DLKerasUnmaterializedPortObjectContent.java:118)
at org.knime.dl.keras.base.portobjects.DLKerasUnmaterializedNetworkPortObject.getNetwork(DLKerasUnmaterializedNetworkPortObject.java:123)
at org.knime.dl.keras.base.portobjects.DLKerasUnmaterializedNetworkPortObject.getNetwork(DLKerasUnmaterializedNetworkPortObject.java:1)
at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.executeInternal(DLKerasLearnerNodeModel.java:624)
at org.knime.dl.keras.base.nodes.learner.DLKerasLearnerNodeModel.execute(DLKerasLearnerNodeModel.java:303)
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:571)
at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1236)
at org.knime.core.node.Node.execute(Node.java:1016)
at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:557)
at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)
at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:218)
at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:124)
at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:334)
at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:210)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
Caused by: org.knime.python2.kernel.PythonIOException: No module named ‘keras’
Traceback (most recent call last):
File “”, line 3, in
ModuleNotFoundError: No module named ‘keras’

at org.knime.python2.util.PythonUtils$Misc.executeCancelable(PythonUtils.java:265)
at org.knime.python2.kernel.PythonKernel.executeCommandCancelable(PythonKernel.java:1314)
at org.knime.python2.kernel.PythonKernel.execute(PythonKernel.java:1246)
at org.knime.dl.python.core.DLPythonDefaultContext.executeInKernel(DLPythonDefaultContext.java:170)
at org.knime.dl.keras.core.layers.DLKerasNetworkMaterializer.materialize(DLKerasNetworkMaterializer.java:181)
at org.knime.dl.keras.base.portobjects.DLKerasUnmaterializedPortObjectContent.materialize(DLKerasUnmaterializedPortObjectContent.java:107)
… 17 more
Caused by: org.knime.python2.kernel.PythonIOException: No module named ‘keras’
Traceback (most recent call last):
File “”, line 3, in
ModuleNotFoundError: No module named ‘keras’

at org.knime.python2.kernel.PythonKernel.executeCommand(PythonKernel.java:1303)
at org.knime.python2.kernel.PythonKernel.execute(PythonKernel.java:1231)
at org.knime.python2.kernel.PythonKernel.lambda$4(PythonKernel.java:1246)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Python:
[’/home/knime/knime-executor/plugins/org.knime.python2_4.1.0.v201911191126/py’, ‘/home/knime/knime-executor/plugins/org.knime.dl.keras_4.1.0.v201911110939/py’, ‘/home/knime/knime-executor/configuration/org.eclipse.osgi/1142/0/.cp/py’, ‘/home/knime/knime-executor/plugins/org.knime.dl.tensorflow_4.1.0.v201909231408/py’, ‘/home/knime/knime-executor/configuration/org.eclipse.osgi/1075/0/.cp/py’, ‘/home/knime/knime-executor/plugins/org.knime.dl.python_4.1.0.v201909231406/py’, ‘/home/knime/knime-executor/plugins/org.knime.python2_4.1.0.v201911191126/py’, ‘/home/knime/knime-executor’, ‘/home/knime/anaconda3/envs/py3_knime_dl/lib/python36.zip’, ‘/home/knime/anaconda3/envs/py3_knime_dl/lib/python3.6’, ‘/home/knime/anaconda3/envs/py3_knime_dl/lib/python3.6/lib-dynload’, ‘/home/knime/anaconda3/envs/py3_knime_dl/lib/python3.6/site-packages’, ‘/home/knime/anaconda3/envs/py3_knime_dl/lib/python3.6/site-packages/IPython/extensions’, ‘/home/knime/knime-executor/plugins/org.knime.python2.serde.flatbuffers_4.1.0.v201908271559/py/’, ‘/home/knime/knime-executor/plugins/org.knime.dl.keras_4.1.0.v201911110939/py:/home/knime/knime-executor/configuration/org.eclipse.osgi/1142/0/.cp/py:/home/knime/knime-executor/plugins/org.knime.dl.tensorflow_4.1.0.v201909231408/py:/home/knime/knime-executor/configuration/org.eclipse.osgi/1075/0/.cp/py:/home/knime/knime-executor/plugins/org.knime.dl.python_4.1.0.v201909231406/py’]

as soon as i start the workflow via “as new Job on Server” it fails at the Keras Network Learner Node.
If I start the Workflow in the Executor local or as an local copy from the server everything works fine.
I also installed Keras and Tensorflow local via pip.

I don’t know why the Node response with the error msg "version mismatch’ keras 2.1.6 and tensorflow 1.8 are for python 2, but I’m using a Python3 Env created via Knime.

maybe someone can help

br,
Sven

Hi Sven,

This entry in the log you posted:

suggests that keras is not installed at all in the Python environment that is being used. Can you double-check whether that package is really installed in the Python environment?

I agree that the error message is misleading. But – just in case this could help clarifying things in some way – I wanted to point out that Keras 2.1.6 and TensorFlow 1.8 do exist for Python 3. In fact, the KNIME Deep Learning integrations only support Python 3, not Python 2.

Marcel

Hi @MarcelW,

I checked with 4.1.3 and 4.2 as long as the workflow isn’t started via webportal everything works fine.
Do you I have to configure the Executor in a special way? I used the step-by-step guide but, maybe I messed the configuration somewhere else up? Maybe the epf-File?

Br,
Sven

.epf file looked fine at first glance. Can you run conda list for the environments listed in there on the server machine?

hey @Marten_Pfannenschmidt,

sorry for the late reply. is it possible that the reason is to find in the profiles stored on the server? or the workflow repository is not stored in the server directory/installed by hand?

regards,
sven