KNIME Spark 2. execution

Hi,

I think I am running into KNIME Spark 2.X issues.

The External SSH Tool node can post Spark1.X commands which work off JDK 1.8 Versions but its not able to post Spark2 commands. I took a look at the log to see more details (Attached below).


Does anyone have a solution for this?

Welcome. The stack trace is odd, given what you say, because it is saying “i can’t load this Spark class because its byte code is for Java8” (… but your post appears to be saying that you can execute Spark 1 commands because they are Java8.)
That stack trace would appear to be being generated by a JVM that is running Java 7 or earlier.

1 Like

Hi @benevarts,

Spark 2.2 requires JDK 1.8. As quaeler already pointed out, the logs sounds like you are running JDK 1.7. Can you run spark-shell on your cluster?

See Upgrading the JDK in the Cloudera Documentation if you run CDH on your cluster.

1 Like

Thanks for the feedback, we are running 1.8 on our cluster. Also, the command line runs fine, but if the command is triggered through the SSH tool node. Is is possible that the KNIME node is changing something before posting the command on to the cluster?

KNIME has a Spark integration, what about running Spark jobs from KNIME using e.g. the Create Spark Context (Livy) node?

If your really need to run the jobs using the SSH node:

  • Make 100% sure that there is no old java version installed on your cluster (e.g. in /usr/lib/jvm or /opt)
  • Validate what java -version and env returns in the SSH node (the ssh node might not pickup the default ENV/PATH settings)
  • Instead of calling spark directly, create a simple shell script that exports PATH and JAVA_HOME and then calls spark-submit

Does this help you? Otherwise the following informations might be helpful:

  • How does the SSH command looks like?
  • What cluster are you running (CDH/HDP/Custom/Version)?
2 Likes

Thanks for the help! how do I go about verifying
“Validate what java -version and env returns in the SSH node (the ssh node might not pickup the default ENV/PATH settings)”
In the External SSH Tool node? how do I find the output?

Enter java -version > /tmp/my-output.txt or env > /tmp/my-output.txt in the remote command field and /tmp/my-output.txt in the remote output file field. Now run the node and check the output.

3 Likes

The java path was missing, that solved it! Thanks!

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.