Persistent error with Knime Spark Extension implementation

Hi,
I recently tried to install and configure the spark-job-server compatible with apache spark 1.6 provided with CDH 5.13.

I have follower the installation steps and when I try to type:

/etc/init.d/spark-job-server start

Everything seems working as expected.

But when I try to create spark context in Knime platform it still doesn’t work.

On the KNIME console I get a general misconfiguration/incompatibility version error:

Log file is located at: /home/ubuntu/knime-workspace/.metadata/knime/knime.log
ERROR Create Spark Context 0:1 HTTP Status code: 500 | Response Body: The server was not able to produce a timely response to your request.
ERROR Create Spark Context 0:1 Execute failed: Spark Jobserver gave unexpected response (for details see View > Open KNIME log). Possible reason: Incompatible Jobserver version, malconfigured Spark Jobserver

The fact is that I’m quite sure that I have correctly configured the node/preferences with the correct version of spark 1.6 (CDH 5.9+). And if it’s really a misconfiguration error, how can I find what causes the problem?

Below the specs:

  • ubuntu 16.04 LTS
  • knime version—>knime_3.5.2.linux.gtk.x86_64.tar.gz
  • CDH 5.13 with default version spark 1.6
  • the version of spark-job-server installed—> spark-job-server-0.6.2.3-KNIME_cdh-5.13.tar.gz

Here the detailed view > log:

2018-03-27 18:05:39,554 : DEBUG : main : Node : Create Spark Context : 0:1 : reset
2018-03-27 18:05:39,554 : DEBUG : main : SparkNodeModel : Create Spark Context : 0:1 : In reset() of SparkNodeModel. Calling deleteSparkDataObjects.
2018-03-27 18:05:39,554 : DEBUG : main : Node : Create Spark Context : 0:1 : clean output ports.
2018-03-27 18:05:39,554 : DEBUG : main : NodeContainer : Create Spark Context : 0:1 : Create Spark Context 0:1 has new state: IDLE
2018-03-27 18:05:39,555 : DEBUG : main : SparkContextCreatorNodeModel : Create Spark Context : 0:1 : Reconfiguring old context with same ID.
2018-03-27 18:05:39,555 : DEBUG : main : Node : Create Spark Context : 0:1 : Configure succeeded. (Create Spark Context)
2018-03-27 18:05:39,555 : DEBUG : main : NodeContainer : Create Spark Context : 0:1 : Create Spark Context 0:1 has new state: CONFIGURED
2018-03-27 18:06:11,318 : DEBUG : main : WorkflowEditor : : : Saving workflow test 0
2018-03-27 18:06:11,358 : DEBUG : ModalContext : FileSingleNodeContainerPersistor : : : Replaced node directory “/home/ubuntu/knime-workspace/test/Create Spark Context (#1)”
2018-03-27 18:06:15,229 : DEBUG : main : ExecuteAction : : : Creating execution job for 1 node(s)…
2018-03-27 18:06:15,230 : DEBUG : main : NodeContainer : : : Setting dirty flag on Create Spark Context 0:1
2018-03-27 18:06:15,230 : DEBUG : main : NodeContainer : : : Setting dirty flag on test 0
2018-03-27 18:06:15,230 : DEBUG : main : NodeContainer : : : Create Spark Context 0:1 has new state: CONFIGURED_MARKEDFOREXEC
2018-03-27 18:06:15,230 : DEBUG : main : NodeContainer : : : Create Spark Context 0:1 has new state: CONFIGURED_QUEUED
2018-03-27 18:06:15,230 : DEBUG : KNIME-Workflow-Notifier : WorkflowEditor : : : Workflow event triggered: WorkflowEvent [type=WORKFLOW_DIRTY;node=0;old=null;new=null;timestamp=Mar 27, 2018 6:06:15 PM]
2018-03-27 18:06:15,230 : DEBUG : main : NodeContainer : : : test 0 has new state: EXECUTING
2018-03-27 18:06:15,230 : DEBUG : KNIME-WFM-Parent-Notifier : NodeContainer : : : ROOT has new state: EXECUTING
2018-03-27 18:06:15,234 : DEBUG : KNIME-Worker-4 : WorkflowManager : Create Spark Context : 0:1 : Create Spark Context 0:1 doBeforePreExecution
2018-03-27 18:06:15,234 : DEBUG : KNIME-Worker-4 : NodeContainer : Create Spark Context : 0:1 : Create Spark Context 0:1 has new state: PREEXECUTE
2018-03-27 18:06:15,234 : DEBUG : KNIME-Worker-4 : WorkflowManager : Create Spark Context : 0:1 : Create Spark Context 0:1 doBeforeExecution
2018-03-27 18:06:15,236 : DEBUG : KNIME-Worker-4 : NodeContainer : Create Spark Context : 0:1 : Create Spark Context 0:1 has new state: EXECUTING
2018-03-27 18:06:15,236 : DEBUG : KNIME-Worker-4 : WorkflowFileStoreHandlerRepository : Create Spark Context : 0:1 : Adding handler 06d64ce1-87ef-4086-8d27-5c970651fa67 (Create Spark Context 0:1: ) - 1 in total
2018-03-27 18:06:15,237 : DEBUG : KNIME-Worker-4 : LocalNodeExecutionJob : Create Spark Context : 0:1 : Create Spark Context 0:1 Start execute
2018-03-27 18:06:15,237 : INFO : KNIME-Worker-4 : JobserverSparkContext : Create Spark Context : 0:1 : Spark context jobserver://localhost:8090/knimeSparkContext changed status from CONFIGURED to CONFIGURED
2018-03-27 18:06:15,238 : DEBUG : KNIME-Worker-4 : JobserverSparkContext : Create Spark Context : 0:1 : Checking if remote context exists. Name: knimeSparkContext
2018-03-27 18:06:15,259 : DEBUG : KNIME-Worker-4 : JobserverSparkContext : Create Spark Context : 0:1 : Remote context does not exist. Name: knimeSparkContext
2018-03-27 18:06:15,259 : DEBUG : KNIME-Worker-4 : JobserverSparkContext : Create Spark Context : 0:1 : Creating new remote Spark context. Name: knimeSparkContext
2018-03-27 18:07:15,500 : ERROR : KNIME-Worker-4 : CreateContextRequest : Create Spark Context : 0:1 : HTTP Status code: 500 | Response Body: The server was not able to produce a timely response to your request.
2018-03-27 18:07:15,501 : INFO : KNIME-Worker-4 : JobserverSparkContext : Create Spark Context : 0:1 : Spark context jobserver://localhost:8090/knimeSparkContext changed status from CONFIGURED to CONFIGURED
2018-03-27 18:07:15,501 : DEBUG : KNIME-Worker-4 : Node : Create Spark Context : 0:1 : reset
2018-03-27 18:07:15,501 : DEBUG : KNIME-Worker-4 : SparkNodeModel : Create Spark Context : 0:1 : In reset() of SparkNodeModel. Calling deleteSparkDataObjects.
2018-03-27 18:07:15,501 : ERROR : KNIME-Worker-4 : Node : Create Spark Context : 0:1 : Execute failed: Spark Jobserver gave unexpected response (for details see View > Open KNIME log). Possible reason: Incompatible Jobserver version, malconfigured Spark Jobserver
2018-03-27 18:07:15,501 : DEBUG : KNIME-Worker-4 : Node : Create Spark Context : 0:1 : Execute failed: Spark Jobserver gave unexpected response (for details see View > Open KNIME log). Possible reason: Incompatible Jobserver version, malconfigured Spark Jobserver
org.knime.bigdata.spark.core.exception.KNIMESparkException: Spark Jobserver gave unexpected response (for details see View > Open KNIME log). Possible reason: Incompatible Jobserver version, malconfigured Spark Jobserver
at org.knime.bigdata.spark.core.context.jobserver.request.AbstractJobserverRequest.createUnexpectedResponseException(AbstractJobserverRequest.java:154)
at org.knime.bigdata.spark.core.context.jobserver.request.AbstractJobserverRequest.handleGeneralFailures(AbstractJobserverRequest.java:123)
at org.knime.bigdata.spark.core.context.jobserver.request.CreateContextRequest.sendInternal(CreateContextRequest.java:76)
at org.knime.bigdata.spark.core.context.jobserver.request.CreateContextRequest.sendInternal(CreateContextRequest.java:1)
at org.knime.bigdata.spark.core.context.jobserver.request.AbstractJobserverRequest.send(AbstractJobserverRequest.java:72)
at org.knime.bigdata.spark.core.context.jobserver.JobserverSparkContext.createRemoteSparkContext(JobserverSparkContext.java:465)
at org.knime.bigdata.spark.core.context.jobserver.JobserverSparkContext.access$4(JobserverSparkContext.java:459)
at org.knime.bigdata.spark.core.context.jobserver.JobserverSparkContext$1.run(JobserverSparkContext.java:242)
at org.knime.bigdata.spark.core.context.jobserver.JobserverSparkContext.runWithResetOnFailure(JobserverSparkContext.java:341)
at org.knime.bigdata.spark.core.context.jobserver.JobserverSparkContext.open(JobserverSparkContext.java:230)
at org.knime.bigdata.spark.core.context.SparkContext.ensureOpened(SparkContext.java:64)
at org.knime.bigdata.spark.node.util.context.create.SparkContextCreatorNodeModel.executeInternal(SparkContextCreatorNodeModel.java:155)
at org.knime.bigdata.spark.core.node.SparkNodeModel.execute(SparkNodeModel.java:242)
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:567)
at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1172)
at org.knime.core.node.Node.execute(Node.java:959)
at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:561)
at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)
at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:179)
at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:110)
at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:328)
at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:204)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
2018-03-27 18:07:15,501 : DEBUG : KNIME-Worker-4 : WorkflowManager : Create Spark Context : 0:1 : Create Spark Context 0:1 doBeforePostExecution
2018-03-27 18:07:15,501 : DEBUG : KNIME-Worker-4 : NodeContainer : Create Spark Context : 0:1 : Create Spark Context 0:1 has new state: POSTEXECUTE
2018-03-27 18:07:15,501 : DEBUG : KNIME-Worker-4 : WorkflowManager : Create Spark Context : 0:1 : Create Spark Context 0:1 doAfterExecute - failure
2018-03-27 18:07:15,501 : DEBUG : KNIME-Worker-4 : Node : Create Spark Context : 0:1 : reset
2018-03-27 18:07:15,501 : DEBUG : KNIME-Worker-4 : SparkNodeModel : Create Spark Context : 0:1 : In reset() of SparkNodeModel. Calling deleteSparkDataObjects.
2018-03-27 18:07:15,501 : DEBUG : KNIME-Worker-4 : Node : Create Spark Context : 0:1 : clean output ports.
2018-03-27 18:07:15,502 : DEBUG : KNIME-Worker-4 : WorkflowFileStoreHandlerRepository : Create Spark Context : 0:1 : Removing handler 06d64ce1-87ef-4086-8d27-5c970651fa67 (Create Spark Context 0:1: ) - 0 remaining
2018-03-27 18:07:15,502 : DEBUG : KNIME-Worker-4 : NodeContainer : Create Spark Context : 0:1 : Create Spark Context 0:1 has new state: IDLE
2018-03-27 18:07:15,502 : DEBUG : KNIME-Worker-4 : SparkContextCreatorNodeModel : Create Spark Context : 0:1 : Reconfiguring old context with same ID.
2018-03-27 18:07:15,502 : DEBUG : KNIME-Worker-4 : Node : Create Spark Context : 0:1 : Configure succeeded. (Create Spark Context)
2018-03-27 18:07:15,502 : DEBUG : KNIME-Worker-4 : NodeContainer : Create Spark Context : 0:1 : Create Spark Context 0:1 has new state: CONFIGURED
2018-03-27 18:07:15,502 : DEBUG : KNIME-Worker-4 : NodeContainer : Create Spark Context : 0:1 : test 0 has new state: CONFIGURED
2018-03-27 18:07:15,502 : DEBUG : KNIME-WFM-Parent-Notifier : NodeContainer : : : ROOT has new state: IDLE

I have also attached other error_log files.

Can someone help to understand how to make it work? I don’t know what to think of it anymore. I have tried all the possible combinations but without any positive results.

Since I didn’t mention it, there is no Kerberos authentication configured, both for the cluster defined in cloudera and knime-spark-job. I have just installed and configured the spark-job-server without any other features.

Thanks…
hs_err_pid13101.log (127.4 KB)
knime.log (15.2 KB)

Hi @gujodm

the knime.log indicates that creating the Spark context fails with HTTP Status code 500, which means that the jobserver had an internal error. To figure this we need jobserver’s own log files which should be under:

/var/log/spark-job-server

Just zip the whole directory and send me the zip via private message.

Best,
Björn

Hi @bjoern.lohrmann, thanks for the reply.
Honestly I didn’t find a way to “send private messages” within knime forum…

Anyway, I have take a look to the log file under /var/log/spark-job-server and fortunately I already found the problem and fixed it:

It was caused by MY misconfiguration on the environment.conf file. I forgot to comment

master = “spark://localhost:7077

, and obviously it couldn’t work cause the only uncommented line should be the

master = yarn-client

since Spark should run on YARN (it does not make much sense to use a standalone cluster when I have already created and configured a real cluster on cloudera).

After restarting with

/etc/init.d/spark-job-server restart

Everything went fine :slight_smile:

Cheers,
~G

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.