Inquiry about how to use Pyspark script node

Hello.

I have question about how to use Pyspark script node with Spark(Livy) environment.
Here is my workflow.
The error message was as below:
Execute failed: Cannot run program “python”: error=2, No such file or directory


I set the config files as below:

[executor.epf]
/instance/org.knime.workbench.core/database_timeout=120

'# Add a mount point for this server. This is useful for the new filehandling nodes.
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/active=true
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/address=${origin:KNIME-EJB-Address}
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/factoryID=com.knime.explorer.server
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/mountID=${origin:KNIME-Default-Mountpoint-ID}
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/mountpointNumber=1
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/restPath=${origin:KNIME-Context-Root}/rest
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/user=${sysprop:user.name}
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/useRest=true

/instance/org.knime.conda/condaDirectoryPath=/opt/miniconda3
/instance/org.knime.python2/defaultPythonOption=python3
/instance/org.knime.python2/python2CondaEnvironmentDirectoryPath=/opt/miniconda3
/instance/org.knime.python2/python2Path=python
/instance/org.knime.python2/python3CondaEnvironmentDirectoryPath=/opt/miniconda3/envs/py3_knime
/instance/org.knime.python2/python3Path=/opt/knime/4.15/workflow_repository/config/client-profiles/executor/python3.exe
/instance/org.knime.python2/pythonEnvironmentType=manual
/instance/org.knime.python2/serializerId=org.knime.serialization.flatbuffers.column

[knime.ini]
-profileLocation
'http://localhost:8080/knime/rest/v4/profiles/contents
-profileList
executor

I’d appreciate it if you could tell me how to solve this problem.
And If you need more information, please tell me :slightly_smiling_face:

Thanks,
hhkim

Hi @hhkim ,

The PySpark node will not use any of the settings mentioned above. The python code from node runs in the Spark cluster on the worker nodes. It will use the python executable in configured for Spark cluster.

Such settings can be changed on a context level in the Create Spark Context (Livy) node (Advanced tab, set custom Spark settings). For example, the python executable can be configured via the spark.pyspark.python option.

If you want to use a conda environment on a cluster, more information can be found here: Python Package Management — PySpark 3.2.0 documentation

Hope this helps
Temesgen

2 Likes

Thanks to your help, I was able to fix it quickly! :blush:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.