Knime create spark context node giving error on EMR

Knime create spark context node giving error while connecting with spark job server.

spark job server url is working fine on port 8090.

ERROR :- 

2018-01-24 08:17:46,998 : DEBUG : KNIME-Worker-20 : Node : Create Spark Context : 0:14 : Execute failed: Error when trying to execute Spark job. Please restart the Spark context, reset all preceding nodes and try again.
org.knime.bigdata.spark.core.context.jobserver.request.RestoredThrowable: java.lang.NoClassDefFoundError: spark/jobserver/api/SparkJob
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at <parseError>.<parseError>(<parseError>:0)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at <parseError>.<parseError>(<parseError>:0)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at <parseError>.<parseError>(<parseError>:0)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at <parseError>.<parseError>(<parseError>:0)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at spark.jobserver.util.JarUtils$.loadConstructor(JarUtils.scala:49)
    at spark.jobserver.util.JarUtils$.fallBackToClass$1(JarUtils.scala:31)
    at spark.jobserver.util.JarUtils$.loadClassOrObject(JarUtils.scala:36)
    at spark.jobserver.JobCache$$anonfun$getSparkJob$1.apply(JobCache.scala:46)
    at spark.jobserver.JobCache$$anonfun$getSparkJob$1.apply(JobCache.scala:37)
    at spark.jobserver.util.LRUCache.get(LRUCache.scala:35)
    at spark.jobserver.JobCache.getSparkJob(JobCache.scala:37)
    at spark.jobserver.JobManagerActor$$anonfun$startJobInternal$1.apply$mcV$sp(JobManagerActor.scala:216)
    at scala.util.control.Breaks.breakable(Breaks.scala:37)
    at spark.jobserver.JobManagerActor.startJobInternal(JobManagerActor.scala:192)
    at spark.jobserver.JobManagerActor$$anonfun$wrappedReceive$1.applyOrElse(JobManagerActor.scala:144)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
    at ooyala.common.akka.ActorStack$$anonfun$receive$1.applyOrElse(ActorStack.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
    at ooyala.common.akka.Slf4jLogging$$anonfun$receive$1$$anonfun$applyOrElse$1.apply$mcV$sp(Slf4jLogging.scala:26)
    at ooyala.common.akka.Slf4jLogging$class.ooyala$common$akka$Slf4jLogging$$withAkkaSourceLogging(Slf4jLogging.scala:35)
    at ooyala.common.akka.Slf4jLogging$$anonfun$receive$1.applyOrElse(Slf4jLogging.scala:25)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
    at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
    at ooyala.common.akka.ActorMetrics$$anonfun$receive$1.applyOrElse(ActorMetrics.scala:24)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
    at ooyala.common.akka.InstrumentedActor.aroundReceive(InstrumentedActor.scala:8)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
    at akka.actor.ActorCell.invoke(ActorCell.scala:487)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
    at akka.dispatch.Mailbox.run(Mailbox.scala:220)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2018-01-24 08:17:47,003 : DEBUG : KNIME-Worker-20 : WorkflowManager : Create Spark Context : 0:14 : Create Spark Context 0:14 doBeforePostExecution

Hi,

the error message "java.lang.NoClassDefFoundError: spark/jobserver/api/SparkJob" implies that the Spark version KNIME assumes does not match the one in your cluster. What version of EMR and Spark are you trying to connect to?

 

Best,

Björn

Hi 

We are using  EMR 5.11 and spark 2.2.1.

Thanks

Hello,

Has anyone managed to connect the Spark Connector with EMR 5.xx?

@ashu9719 : how did you install Job Server (v 0.8) on EMR? the GitHub manual it's only for v 0.7 or earlier...

I can do the connection on EMR 4.2.0 (Spark 1.6.0) and it's working OK, but not for new versions of EMR.

If you can show me the process It'll be great!

Thanks in advance.

Here is a bootstrap for installing the job server on EMR 5.11.1

#!/bin/bash

# !!!!!!Commented lines need to be executed as part of a step after the EMR cluster is bootstrapped.

grep -Fq "\"isMaster\": true" /mnt/var/lib/info/instance.json
if [ $? -eq 0 ];
then
    wget http://download.knime.org/store/3.5/spark-job-server-0.7.0.3-KNIME_spark-2.2_hdp.tar.gz
    sudo useradd -d /opt/spark2-job-server/ -M -r -s /bin/false spark-job-server
    # /usr/bin/hdfs dfs -mkdir -p /user/spark-job-server
    # /usr/bin/hdfs dfs -chown -R spark-job-server /user/spark-job-server
    sudo cp spark-job-server-0.7.0.3-KNIME_spark-2.2_hdp.tar.gz /opt
    cd /opt
    sudo tar xzf spark-job-server-0.7.0.3-KNIME_spark-2.2_hdp.tar.gz
    sudo ln -s spark-job-server-0.7.0.3-KNIME_spark-2.2_hdp spark2-job-server
    sudo chown -R spark-job-server:spark-job-server spark2-job-server spark-job-server-0.7.0.3-KNIME_spark-2.2_hdp
    sudo ln -s /opt/spark2-job-server/spark-job-server-init.d /etc/init.d/spark2-job-server
    sudo chkconfig --levels 2345 spark2-job-server on
    cd /opt/spark-job-server-0.7.0.3-KNIME_spark-2.2_hdp
    sudo sed -i '/SPARK_HOME=/c\SPARK_HOME=/usr/lib/spark' settings.sh
    sudo sed -i '/Run Spark locally with 4 worker threads/c\master = "yarn-client"' environment.conf
    # sudo /etc/init.d/spark2-job-server start
fi