Pyspark error that the specified file was not found

Hello. I have a question about connecting to a Pyspark node.

I use a Pyspark node in my local environment by connecting with the “Create Local Big Data Environment” node.

However, when I enter the configure window of Pyspark node, I get an error saying that the specified file is not found as shown below.

What does the specified file mean?

I have installed python , hadoop and java for using Spark, is it related to the version of these things?

Any help would be appreciated.

Hello @JaeHwanChoi ,
The error seems to be related to a Python Executable that has not been correctly selected.
Can you please double-check in File > Preferences > Python that you have selected a working Python environment? If it gives you an error, you can for example select an existing Conda environment or create a new one directly from that page.

1 Like

Thank you for your response. @emilio_s !!

I’ve run it in the same Python environment as I do when using Python Scirpt, and I’m getting the same error with Pyspark Script.

By any chance, in the “Create Local Big Data Environment” node, do I need to specify the path where Python is installed or the path where the Python Package is installed?

Thanks for your quick response. I need your help as my job is stuck with that error.

There was an additional error message about the above that I replied to.

I checked the KNIME log to see the error messages from the Pyspark Script node,
“java.io.IOException: Cannot run program “no_conda_environment_selected\python.exe”: CreateProcess error=2, The specified file was not found”.

I also installed Java to use Spark, does this mean I have the wrong version of Java, or the wrong path to install Java?

Have you configured also the conda environment in the Python (legacy) tab?
It might be possible that the Create Local Big Data Environment node still uses the conda environment selected in that tab (and not in the new Python tab).

2 Likes

Thank you for your response. @emilio_s

I did the same in the Python (legacy) tab and set up a virtual environment for Python3 and a virtual environment for Python2, but I still can’t connect to Pyspark Scirpt.

The log below is the path to the Python installation and environment in the epf file.

Can anyone help me identify what went wrong?

Additionally, Python Scirpt works well when using the same environment. Only Pyspark Script is causing problems.

I’d be grateful for any answers.

Hi @JaeHwanChoi,

What error message do you get, after selecting Python 3 and a Python 3 Conda environment on the Python (legacy) tab? Maybe restart KNIME, and run the Local Big Data + PySpark node again.

Note that the current Spark version requires Python 3. What KNIME version are you using?

Cheers,
Sascha

Thank you for your response.

After numerous attempts, I have solved the problem.

I uninstalled Anaconda, wiped the existing KNIME virtual environments and rebuilt them all.

At this point, I still couldn’t find the Python.exe file, but swapping the paths inside the virtual environment and the Python installation in the epf file worked.

Thanks for helping me solve the problem.
Have a great day.

2 Likes

Hi @JaeHwanChoi,

sorry that it was such a long road. Many thanks for sharing your solution!

Steffen

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.