I have Spark 2.4 running on Cloudera Quickstart VM (CDH v5.13). I installed Spark-Job Server as per the guilde lines from the here. When I try to run the example Workflow named "01_Spark_MLib_Decision_Tree) from Knime (v4.1) I get the following error.
ERROR Table to Spark 0:202 Execute failed: Cannot read input file on jobserver: \tmp\spark2-job-server\upload\knime-table2spark1957749675575975821.tmp-2020-01-17T22_17_11.517-08_00.dat
The “Create Spark Context (JobServer)” node executes successfully. I have looked at the Spark-job-server log and has no error logged in any of the log files. What is this error referring to and how can I tackle the issue? (I have installed Livy, as that was the recommended but I faced another problem with that so I tried using Spark Job Server).
Same Problem, just with an DataFrame Java Snippet:
ERROR Spark DataFrame Java Snippet (Source) 2:38 Execute failed: Cannot read input file on jobserver: \tmp\spark2-job-server\upload\SparkDataFrameJavaSnippetSource4194e0c6b1f2f0d6a4028d6e1e8cc2516d376ade223bef78e997fef74ad197988283960273699121586.jar-2020-01-20T10_02_05.823Z.dat
I’m using Spark 2.3.2 - with Knime 3.7.2 everything works without any problem - I think this could be a bug in the new Knime 4.x Versions!
From my perspective it looks like, that the path to the jar-File, submitted by KNIME, is using the wrong delimiter (Backslash instead of Slash).
Are you @dokandar also using KNIME on Windows?
@dokandar what problems are you facing with Livy? I would highly suggest to use Livy instead of Spark Job Server.
Please note, that we deprecated the support for Spark Jobserver > Spark 2.2. Spark Jobserver should only be chosen if support for Spark 2.1 or older is required. We recommend at least Spark 2.2 using Livy as your REST Service.
Still we would like to help, but to narrow the problem it would be helpful if you could give some context:
On which operating system are you running KNIME?
Where did you download the Spark-Jobserver from?
Did you make any changes in the configuration of the Job Server?
PS.: Another post of you suggests that you managed to set up Livy properly. You can easily switch the Spark Context node in the example workflow to Livy and run the workflow. (Or try the Local Big Data Environment node)
Yes, I am using Knime on Windows 10 with the Cloudera Quickstart VM, which is built on top of Centos 6. Is there a configuration I have to change? On the Cloudera VM, I can see that the files are at the given destination ( I used forward slash “/” for the path e.g.: /tmp/spark2-job-server/upload/knime.X), and the spark-job-server have permission to access them.
There was an exception that was thrown by py4 library. It complained about not being able to create anymore threads and threw and exception. I couldn’t find what the error was referring to, or how to tackle it.
Thank you for your recommendation. I did intend to use it, but after facing problem with Livy and not finding a solution I tried this Spark-Job-Server. To answer your questions:
- Knime is being run on Windows 10.
- I downloaded the Spark-Job-Server from the Knime Website.
- I didn’t make any changes to the default configurations and just followed the guide from Knime website linked above.
ok, I was able to look into that a bit deeper now. Unfortunately I can confirm, that this is a bug in KNIME that was introduced with KNIME 4.0.
I open a bug ticket for this, and we will try to fix this issue as soon as possible.
However, I would actually recommend that we rather try to help you fix the issues with your Livy setup, because SJS will not get any more KNIME specific bug fixes, and there will be a growing number of nodes, that will only work for Spark >2.2.
Would you mind, to open another thread for the py4 libraries issue?
best regards Mareike
Just to close this one: This issue was resolved with KNIME 4.1.1. See issue BD-1003 in https://www.knime.com/changelog-v41.