Hello Community!
I’m encountering an issue when trying to execute a workflow with a PySpark Script node via a .bat
file. The error message I receive is:
ERROR KNIME-Worker-15-PySpark Script (1 to 1) 3:2612 Node Execute failed: Cannot run program “no_conda_environment_selected\python.exe”: CreateProcess error=2, The system cannot find the file specified
org.knime.bigdata.spark.core.exception.KNIMESparkException: Cannot run program “no_conda_environment_selected\python.exe”: CreateProcess error=2, The system cannot find the file specified
at org.knime.bigdata.spark3_5.jobs.scripting.python.PySparkJob.runJob(PySparkJob.java:103)
at org.knime.bigdata.spark3_5.jobs.scripting.python.PySparkJob.runJob(PySparkJob.java:1)
at org.knime.bigdata.spark.local.wrapper.LocalSparkWrapperImpl.runJob(LocalSparkWrapperImpl.java:129)
at org.knime.bigdata.spark.local.context.LocalSparkJobController.lambda$1(LocalSparkJobController.java:92)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
The workflow runs without any issue when executed directly in the KNIME app. Interestingly, non-PySpark nodes (e.g., the regular Python node) execute just fine, as do other PySpark nodes (like group by or join), but it’s only the PySpark Script (1 to 1) , PySpark Script (1 to 2) etc, that breaks when executed in batch mode.
I’m running the workflow using a .bat
file, starting with the “Create Local Big Data Environment” node to convert the workflow to Spark nodes.
What I’ve Tried:
- The workflow runs without problems in the KNIME app itself.
- The
.bat
file is set up as follows:
“D:\knime\KNIME\knime.exe” -consoleLog -reset -nosplash -failonloaderror -application org.knime.product.KNIME_BATCH_APPLICATION -workflowDir=“D:\knime\knime-workspace\workflow1”
I’ve attempted other approaches, such as directly calling the environment, but the issue persists.
I’m not sure why the PySpark Script node works in the app but fails when executed in batch mode. Any guidance or suggestions to resolve this issue would be greatly appreciated!
Thanks