py4j.protocol.Py4JError: An error occurred while calling GATEWAY_SERVER.getCallbackClient

Guillestre · May 12, 2023, 2:46pm

Hi everyone,

While running a workflow on a distant server, an error occured :

raise Py4JError(
py4j.protocol.Py4JError: An error occurred while calling GATEWAY_SERVER.getCallbackClient. Trace:
Object ID unknown

ERROR KNIME-Worker-10-Python Script 3:1256 Node Execute failed: An exception occured while running the Python kernel. See log for details.
org.knime.python2.kernel.PythonIOException: An exception occured while running the Python kernel. See log for details.
at org.knime.python2.kernel.PythonKernelQueue$KeyedPooledPythonKernelFactory.createKernel(PythonKernelQueue.java:411)
at org.knime.python2.kernel.PythonKernelQueue$KeyedPooledPythonKernelFactory.populateHolder(PythonKernelQueue.java:396)
at org.knime.python2.kernel.PythonKernelQueue$KeyedPooledPythonKernelFactory.passivateObject(PythonKernelQueue.java:390)
at org.knime.python2.kernel.PythonKernelQueue$KeyedPooledPythonKernelFactory.passivateObject(PythonKernelQueue.java:1)
at org.apache.commons.pool2.impl.GenericKeyedObjectPool.returnObject(GenericKeyedObjectPool.java:483)
at org.knime.python2.kernel.PythonKernelQueue.lambda$1(PythonKernelQueue.java:318)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.net.ConnectException: Could not connect to the Python process.
at org.knime.python3.DefaultPythonGateway.waitForConnection(DefaultPythonGateway.java:261)
at org.knime.python3.DefaultPythonGateway.(DefaultPythonGateway.java:192)
at org.knime.python3.DefaultPythonGateway.create(DefaultPythonGateway.java:144)
at org.knime.python3.scripting.Python3KernelBackend.(Python3KernelBackend.java:291)
at org.knime.python3.scripting.Python3KernelBackend.(Python3KernelBackend.java:240)
at org.knime.python3.scripting.Python3KernelBackendFactory.createBackend(Python3KernelBackendFactory.java:70)
at org.knime.python2.kernel.PythonKernelQueue$KeyedPooledPythonKernelFactory.createKernel(PythonKernelQueue.java:407)
… 6 more
May 12, 2023 4:07:31 PM org.apache.cxf.bus.osgi.CXFExtensionBundleListener unregister

I found this post that seem to tell that I assign more ressources to Spark than Docker had access to :
apache spark ml - Pyspark ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:50532) - Stack Overflow

I am not sure how to solve this with KNIME. I was thinking to assign more RAM, but I do not know if it could solve the problem. However, it does not seem that my workflow has a problem about python code I am using and so on. I am doing some parameters optimization with this workflow, so it takes a long time to complete the execution. I never saw this kind of error when it runs on my own laptop.

Have you ever stumble on an error like this ?

Thanks a lot for your help.

k10shetty1 · May 22, 2023, 12:59pm

Hello Guillestre,

Welcome to the KNIME community.

Could you please say the version of the KNIME AP you are using and confirm if you are using the most recent version of the Python Script node? Also, can you share the KNIME log file and either the script or a sample workflow? This will help us further investigate the issue.

Best,
Keerthan

Guillestre · May 22, 2023, 2:13pm

Hello @k10shetty1,

Thank you a lot for your response.

This is some informations about KNIME and my workflow :

KNIME version → 4.7.2
I use the “Python Script” nodes and not the “legacy” nodes.
Python version used → 3.9.16
Conda version → 23.3.1

Guillestre · May 22, 2023, 2:23pm

This is my current version of my workflow :
workflow.knwf (289.3 KB)

I can adapt it in order to execute it with noisy data if you want.

Guillestre · May 22, 2023, 2:50pm

I’ve executed one more time. I can have this error sometimes too :

py4j.Py4JException: Cannot obtain a new communication channel
at py4j.CallbackClient.sendCommand(CallbackClient.java:380)
at py4j.CallbackClient.sendCommand(CallbackClient.java:356)
at py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:106)
at jdk.proxy8/jdk.proxy8.$Proxy24.initializeCurrentWorkingDirectory(Unknown Source)
at org.knime.python3.scripting.Python3KernelBackend.initializeCurrentWorkingDirToWorkflowDir(Python3KernelBackend.java:411)
at org.knime.python3.scripting.Python3KernelBackend.setOptions(Python3KernelBackend.java:382)
at org.knime.python2.kernel.PythonKernel.setOptions(PythonKernel.java:235)
at org.knime.python2.kernel.PythonKernelQueue.configureOrRecreateKernel(PythonKernelQueue.java:332)
at org.knime.python2.kernel.PythonKernelQueue.getNextKernelInternal(PythonKernelQueue.java:284)
at org.knime.python2.kernel.PythonKernelQueue.getNextKernel(PythonKernelQueue.java:198)
at org.knime.python3.scripting.nodes.AbstractPythonScriptingNodeModel.getNextKernelFromQueue(AbstractPythonScriptingNodeModel.java:349)
at org.knime.python3.scripting.nodes.AbstractPythonScriptingNodeModel.execute(AbstractPythonScriptingNodeModel.java:214)
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:549)
at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1267)
at org.knime.core.node.Node.execute(Node.java:1041)
at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:595)
at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:98)
at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:201)
at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:117)
at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:367)
at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:221)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
May 22, 2023 4:31:35 PM org.apache.cxf.bus.osgi.CXFExtensionBundleListener unregister
INFO: Removing the extensions for bundle 152
May 22, 2023 4:31:35 PM org.apache.cxf.bus.osgi.CXFExtensionBundleListener unregister
INFO: Removing the extensions for bundle 156
May 22, 2023 4:31:35 PM org.apache.cxf.bus.osgi.CXFExtensionBundleListener unregister
INFO: Removing the extensions for bundle 157
Knime: Cannot open display:
Knime:
JVM terminated. Exit code=4
/users/21801992t/knime_4.7.2//plugins/org.knime.binary.jre.linux.x86_64_17.0.5.20221116/jre/bin/java
-Djava.security.properties=plugins/org.knime.binary.jre.linux.x86_64_17.0.5.20221116/security.properties
-server
-Dsun.java2d.d3d=false
-Dosgi.classloader.lock=classname
-XX:+UnlockDiagnosticVMOptions
-Dsun.net.client.defaultReadTimeout=0
-XX:CompileCommand=exclude,javax/swing/text/GlyphView,getBreakSpot
-Dknime.xml.disable_external_entities=true
–add-opens=java.base/java.lang=ALL-UNNAMED
–add-opens=java.base/java.lang.invoke=ALL-UNNAMED
–add-opens=java.base/java.net=ALL-UNNAMED
–add-opens=java.base/java.nio=ALL-UNNAMED
–add-opens=java.base/java.nio.channels=ALL-UNNAMED
–add-opens=java.base/java.util=ALL-UNNAMED
–add-opens=java.base/sun.nio.ch=ALL-UNNAMED
–add-opens=java.base/sun.nio=ALL-UNNAMED
–add-opens=java.desktop/javax.swing.plaf.basic=ALL-UNNAMED
–add-opens=java.base/sun.net.www.protocol.http=ALL-UNNAMED
-Xmx2048m
-Dorg.eclipse.swt.internal.gtk.disablePrinting
-Dchromium.multi_threaded_message_loop=true
-Darrow.enable_unsafe_memory_access=true
-Darrow.memory.debug.allocator=false
-Darrow.enable_null_check_for_get=false
–add-opens=java.security.jgss/sun.security.jgss.krb5=ALL-UNNAMED
–add-exports=java.security.jgss/sun.security.jgss=ALL-UNNAMED
–add-exports=java.security.jgss/sun.security.jgss.spi=ALL-UNNAMED
–add-exports=java.security.jgss/sun.security.krb5.internal=ALL-UNNAMED
–add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED
-jar /users/21801992t/knime_4.7.2//plugins/org.eclipse.equinox.launcher_1.6.400.v20210924-0641.jar
-os linux
-ws gtk
-arch x86_64
-launcher /users/21801992t/knime_4.7.2/knime
-name Knime
–launcher.library /users/21801992t/knime_4.7.2//plugins/com.equo.chromium.cef.gtk.linux.x86_64_95.0.6/chromium-4638/eclipse_11701.so
-startup /users/21801992t/knime_4.7.2//plugins/org.eclipse.equinox.launcher_1.6.400.v20210924-0641.jar
–launcher.overrideVmargs
-exitdata 80098
-reset
-application org.knime.product.KNIME_BATCH_APPLICATION
-workflowDir=ALS_CHU_A_multi_dimension
-vm /users/21801992t/knime_4.7.2//plugins/org.knime.binary.jre.linux.x86_64_17.0.5.20221116/jre/bin/java
-vmargs
-Djava.security.properties=plugins/org.knime.binary.jre.linux.x86_64_17.0.5.20221116/security.properties
-server
-Dsun.java2d.d3d=false
-Dosgi.classloader.lock=classname
-XX:+UnlockDiagnosticVMOptions
-Dsun.net.client.defaultReadTimeout=0
-XX:CompileCommand=exclude,javax/swing/text/GlyphView,getBreakSpot
-Dknime.xml.disable_external_entities=true
–add-opens=java.base/java.lang=ALL-UNNAMED
–add-opens=java.base/java.lang.invoke=ALL-UNNAMED
–add-opens=java.base/java.net=ALL-UNNAMED
–add-opens=java.base/java.nio=ALL-UNNAMED
–add-opens=java.base/java.nio.channels=ALL-UNNAMED
–add-opens=java.base/java.util=ALL-UNNAMED
–add-opens=java.base/sun.nio.ch=ALL-UNNAMED
–add-opens=java.base/sun.nio=ALL-UNNAMED
–add-opens=java.desktop/javax.swing.plaf.basic=ALL-UNNAMED
–add-opens=java.base/sun.net.www.protocol.http=ALL-UNNAMED
-Xmx2048m
-Dorg.eclipse.swt.internal.gtk.disablePrinting
-Dchromium.multi_threaded_message_loop=true
-Darrow.enable_unsafe_memory_access=true
-Darrow.memory.debug.allocator=false
-Darrow.enable_null_check_for_get=false
–add-opens=java.security.jgss/sun.security.jgss.krb5=ALL-UNNAMED
–add-exports=java.security.jgss/sun.security.jgss=ALL-UNNAMED
–add-exports=java.security.jgss/sun.security.jgss.spi=ALL-UNNAMED
–add-exports=java.security.jgss/sun.security.krb5.internal=ALL-UNNAMED
–add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED
-jar /users/21801992t/knime_4.7.2//plugins/org.eclipse.equinox.launcher_1.6.400.v20210924-0641.jar

Guillestre · May 22, 2023, 2:53pm

I forgot to tell that I get this error when I execute it with command line. However, it do not seem that I get this error when I execute the wokflow with the graphical interface. But I have to do it by command line.

The error do not occur directly after executing workflows. The error do not seem to appear at a specific moment. It can takes few minutes like 1 hour before the error occur. It seem to be random.

I think the error seem to appear at “python script” nodes.

Guillestre · May 23, 2023, 9:18am

The same error appear also with the graphical interface in fact.

k10shetty1 · May 23, 2023, 10:15am

Hi Guillestre,

Since I do not have access to the input files, I am not able to investigate further, can you share a dummy workflow with some data which is not sensitive? Also, can you share the KNIME log file at the time of the error?

Best,
Keerthan

Guillestre · May 23, 2023, 1:16pm

Hello @k10shetty1,

This is a sample of my workflow with a small amount of data. This workflow only contains the “optimization” section, it should be enough. I can add more data otherwise if needed.

You’ll need to add extra libraries in order to run it :

umap-learn 0.5.3
scikit-learn-extra 0.3.0
pytwed 1.0.9
yellowbrick 1.5

I think I listed them all. Tell me if there are any errors not related to what I sent you previously.

The error seem to be related to port availability when python nodes are executed. I didn’t seen theses errors on my own laptop for now, only on the distant server. Thanks a lot.

Debug.knar (214.0 KB)

Best,
Guillaume

Guillestre · May 23, 2023, 1:48pm

For now, I can’t send you the log right now. I’ll send to you as soon as possible.

Best,

Guillaume

k10shetty1 · May 25, 2023, 3:35pm

Hello,

I noticed you are using the brute force strategy for optimization, I tried running the workflow in ‘bayesian optimization’ strategy, there was no issue until one of the python scripts (embed = nbook.get_umap_dimensions(df, nu, lmbda, y_axis_type)) failed with the below issue :

TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

Maybe, this link will help you with it.

Best,
Keerthan

Guillestre · July 24, 2023, 11:44am

Hello @k10shetty1,

Very sorry for the delay. So, I found a solution to solve my problem. It is not the most clever one, but it was enough to do what I wanted. Since my error occurs in a random way from python nodes, I make some try catch loops. Thus, if the error occurs, then we repeat from last iteration starting from try node. Usually, it works the next time. I’ve been inspired by some sources that helped me:

try_catch_with_loops – KNIME Community Hub

Catch an error in an optimization loop and continue (Forum 38402) – KNIME Community Hub

Error handling (Try - Catch Error) Design Question - KNIME Analytics Platform - KNIME Community Forum

This is the workflow I used with the try catch nodes with all files needed to run it. I use Bayesian Optimization for this one.

Debugging.knar (1.7 MB)

Thank you very much for your help

Best,

Guillaume

system · July 31, 2023, 11:45am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.