ERROR Create Spark Context 0:8

I have been working with spark on my Mapr Clusters for a month... was working yesterday and I shut down my laptop, slept, wake up the next morning, and today always:
ERROR Create Spark Context 0:8        Execute failed: Error connecting to Spark Jobserver. Possible reasons: Spark Jobserver is down or invalid connection settings (for details see View > Open KNIME log)

I use the same spark context for almost a month, and suddenly today like this:
 

2017-11-22 15:28:06,982 : DEBUG : main : ExecuteAction :  :  : Creating execution job for 1 node(s)...
2017-11-22 15:28:06,982 : DEBUG : main : NodeContainer :  :  : Create Spark Context 0:8 has new state: CONFIGURED_MARKEDFOREXEC
2017-11-22 15:28:06,982 : DEBUG : main : NodeContainer :  :  : Create Spark Context 0:8 has new state: CONFIGURED_QUEUED
2017-11-22 15:28:06,983 : DEBUG : main : NodeContainer :  :  : CLV 0 has new state: EXECUTING
2017-11-22 15:28:06,983 : DEBUG : KNIME-Worker-19 : WorkflowManager : Create Spark Context : 0:8 : Create Spark Context 0:8 doBeforePreExecution
2017-11-22 15:28:06,983 : DEBUG : KNIME-Worker-19 : NodeContainer : Create Spark Context : 0:8 : Create Spark Context 0:8 has new state: PREEXECUTE
2017-11-22 15:28:06,983 : DEBUG : KNIME-Worker-19 : WorkflowManager : Create Spark Context : 0:8 : Create Spark Context 0:8 doBeforeExecution
2017-11-22 15:28:06,983 : DEBUG : KNIME-Worker-19 : NodeContainer : Create Spark Context : 0:8 : Create Spark Context 0:8 has new state: EXECUTING
2017-11-22 15:28:06,983 : DEBUG : KNIME-Worker-19 : WorkflowFileStoreHandlerRepository : Create Spark Context : 0:8 : Adding handler 9d1284a2-a615-40c5-bf01-5e8f1c5bb8c5 (Create Spark Context 0:8: <no directory>) - 2 in total
2017-11-22 15:28:06,983 : DEBUG : KNIME-Worker-19 : LocalNodeExecutionJob : Create Spark Context : 0:8 : Create Spark Context 0:8 Start execute
2017-11-22 15:28:06,983 : DEBUG : KNIME-WFM-Parent-Notifier : NodeContainer :  :  : ROOT  has new state: EXECUTING
2017-11-22 15:28:06,984 : INFO  : KNIME-Worker-19 : JobserverSparkContext : Create Spark Context : 0:8 : Spark context jobserver://maskedcluster:8585/knimeSparkContext changed status from CONFIGURED to CONFIGURED
2017-11-22 15:28:06,986 : DEBUG : KNIME-Worker-19 : JobserverSparkContext : Create Spark Context : 0:8 : Checking if remote context exists. Name: knimeSparkContext
2017-11-22 15:28:06,999 : DEBUG : KNIME-Worker-19 : JobserverSparkContext : Create Spark Context : 0:8 : Remote context does not exist. Name: knimeSparkContext
2017-11-22 15:28:06,999 : DEBUG : KNIME-Worker-19 : JobserverSparkContext : Create Spark Context : 0:8 : Creating new remote Spark context. Name: knimeSparkContext
2017-11-22 15:29:07,006 : INFO  : KNIME-Worker-19 : JobserverSparkContext : Create Spark Context : 0:8 : Spark context jobserver://maskedcluster:8585/knimeSparkContext changed status from CONFIGURED to CONFIGURED
2017-11-22 15:29:07,007 : DEBUG : KNIME-Worker-19 : Create Spark Context : Create Spark Context : 0:8 : reset
2017-11-22 15:29:07,007 : DEBUG : KNIME-Worker-19 : SparkNodeModel : Create Spark Context : 0:8 : In reset() of SparkNodeModel. Calling deleteRDDs.
2017-11-22 15:29:07,007 : ERROR : KNIME-Worker-19 : Create Spark Context : Create Spark Context : 0:8 : Execute failed: Error connecting to Spark Jobserver. Possible reasons: Spark Jobserver is down or invalid connection settings (for details see View > Open KNIME log)
2017-11-22 15:29:07,007 : DEBUG : KNIME-Worker-19 : Create Spark Context : Create Spark Context : 0:8 : Execute failed: Error connecting to Spark Jobserver. Possible reasons: Spark Jobserver is down or invalid connection settings (for details see View > Open KNIME log)
javax.ws.rs.ProcessingException: java.net.SocketTimeoutException: SocketTimeoutException invoking http://maskedcluster:8585/contexts/knimeSparkContext: Read timed out
    at org.apache.cxf.jaxrs.client.AbstractClient.checkClientException(AbstractClient.java:582)
    at org.apache.cxf.jaxrs.client.AbstractClient.preProcessResult(AbstractClient.java:564)
    at org.apache.cxf.jaxrs.client.WebClient.doResponse(WebClient.java:1144)
    at org.apache.cxf.jaxrs.client.WebClient.doChainedInvocation(WebClient.java:1094)
    at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:894)
    at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:865)
    at org.apache.cxf.jaxrs.client.WebClient.invoke(WebClient.java:428)
    at org.apache.cxf.jaxrs.client.WebClient$SyncInvokerImpl.method(WebClient.java:1631)
    at org.apache.cxf.jaxrs.client.WebClient$SyncInvokerImpl.method(WebClient.java:1626)
    at org.apache.cxf.jaxrs.client.WebClient$SyncInvokerImpl.post(WebClient.java:1566)
    at org.apache.cxf.jaxrs.client.spec.InvocationBuilderImpl.post(InvocationBuilderImpl.java:145)
    at com.knime.bigdata.spark.core.context.jobserver.rest.WsRsRestClient.post(WsRsRestClient.java:210)
    at com.knime.bigdata.spark.core.context.jobserver.rest.RestClient.post(RestClient.java:79)
    at com.knime.bigdata.spark.core.context.jobserver.request.CreateContextRequest.sendInternal(CreateContextRequest.java:62)
    at com.knime.bigdata.spark.core.context.jobserver.request.CreateContextRequest.sendInternal(CreateContextRequest.java:1)
    at com.knime.bigdata.spark.core.context.jobserver.request.AbstractJobserverRequest.send(AbstractJobserverRequest.java:73)
    at com.knime.bigdata.spark.core.context.jobserver.JobserverSparkContext.createRemoteSparkContext(JobserverSparkContext.java:466)
    at com.knime.bigdata.spark.core.context.jobserver.JobserverSparkContext.access$4(JobserverSparkContext.java:460)
    at com.knime.bigdata.spark.core.context.jobserver.JobserverSparkContext$1.run(JobserverSparkContext.java:243)
    at com.knime.bigdata.spark.core.context.jobserver.JobserverSparkContext.runWithResetOnFailure(JobserverSparkContext.java:342)
    at com.knime.bigdata.spark.core.context.jobserver.JobserverSparkContext.open(JobserverSparkContext.java:231)
    at com.knime.bigdata.spark.core.context.SparkContext.ensureOpened(SparkContext.java:58)
    at com.knime.bigdata.spark.node.util.context.create.SparkContextCreatorNodeModel.executeInternal(SparkContextCreatorNodeModel.java:156)
    at com.knime.bigdata.spark.core.node.SparkNodeModel.execute(SparkNodeModel.java:235)
    at org.knime.core.node.NodeModel.executeModel(NodeModel.java:567)
    at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1128)
    at org.knime.core.node.Node.execute(Node.java:915)
    at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:561)
    at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)
    at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:179)
    at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:110)
    at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:328)
    at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:204)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
    at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
Caused by: java.net.SocketTimeoutException: SocketTimeoutException invoking http://maskedcluster:8585/contexts/knimeSparkContext: Read timed out
    at sun.reflect.GeneratedConstructorAccessor106.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.mapException(HTTPConduit.java:1376)
    at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.close(HTTPConduit.java:1360)
    at org.apache.cxf.transport.AbstractConduit.close(AbstractConduit.java:56)
    at org.apache.cxf.transport.http.HTTPConduit.close(HTTPConduit.java:651)
    at org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:62)
    at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
    at org.apache.cxf.jaxrs.client.AbstractClient.doRunInterceptorChain(AbstractClient.java:649)
    at org.apache.cxf.jaxrs.client.WebClient.doChainedInvocation(WebClient.java:1093)
    ... 33 more
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:170)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1535)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
    at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
    at org.apache.cxf.transport.http.URLConnectionHTTPConduit$URLConnectionWrappedOutputStream.getResponseCode(URLConnectionHTTPConduit.java:332)
    at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.doProcessResponseCode(HTTPConduit.java:1580)
    at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.handleResponseInternal(HTTPConduit.java:1609)
    at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.handleResponse(HTTPConduit.java:1550)
    at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.close(HTTPConduit.java:1347)
    ... 39 more
2017-11-22 15:29:07,007 : DEBUG : KNIME-Worker-19 : WorkflowManager : Create Spark Context : 0:8 : Create Spark Context 0:8 doBeforePostExecution
2017-11-22 15:29:07,008 : DEBUG : KNIME-Worker-19 : NodeContainer : Create Spark Context : 0:8 : Create Spark Context 0:8 has new state: POSTEXECUTE
2017-11-22 15:29:07,011 : DEBUG : KNIME-Worker-19 : WorkflowManager : Create Spark Context : 0:8 : Create Spark Context 0:8 doAfterExecute - failure
2017-11-22 15:29:07,011 : DEBUG : KNIME-Worker-19 : Create Spark Context : Create Spark Context : 0:8 : reset
2017-11-22 15:29:07,011 : DEBUG : KNIME-Worker-19 : SparkNodeModel : Create Spark Context : 0:8 : In reset() of SparkNodeModel. Calling deleteRDDs.
2017-11-22 15:29:07,011 : DEBUG : KNIME-Worker-19 : Create Spark Context : Create Spark Context : 0:8 : clean output ports.
2017-11-22 15:29:07,011 : DEBUG : KNIME-Worker-19 : WorkflowFileStoreHandlerRepository : Create Spark Context : 0:8 : Removing handler 9d1284a2-a615-40c5-bf01-5e8f1c5bb8c5 (Create Spark Context 0:8: <no directory>) - 1 remaining
2017-11-22 15:29:07,011 : DEBUG : KNIME-Worker-19 : NodeContainer : Create Spark Context : 0:8 : Create Spark Context 0:8 has new state: IDLE
2017-11-22 15:29:07,012 : DEBUG : KNIME-Worker-19 : SparkContextCreatorNodeModel : Create Spark Context : 0:8 : Reconfiguring old context with same ID.
2017-11-22 15:29:07,012 : DEBUG : KNIME-Worker-19 : Create Spark Context : Create Spark Context : 0:8 : Configure succeeded. (Create Spark Context)
2017-11-22 15:29:07,012 : DEBUG : KNIME-Worker-19 : NodeContainer : Create Spark Context : 0:8 : Create Spark Context 0:8 has new state: CONFIGURED
2017-11-22 15:29:07,012 : DEBUG : KNIME-Worker-19 : NodeContainer : Create Spark Context : 0:8 : CLV 0 has new state: IDLE
2017-11-22 15:29:07,014 : DEBUG : KNIME-WFM-Parent-Notifier : NodeContainer :  :  : ROOT  has new state: IDLE

 

Checked the cluster, the spark job server is running. Restarted, and still like above. 
Any idea?

Hi,

you get a read timeout when connecting to the job server. However the KNIME log file does not state why. Can you try to see if you can reach the Spark job Server UI via a browser from the machine you are running KNIME on. To do so just copy and past the Job Server URL e.g. http://maskedcluster:8585 into a browser.

For more details you will need to check the Spark Job Server log files which are located on the machine the Spark job Server is running on. The log files are located in the path specified in the settings.sh file which is usually LOG_DIR=/var/log/spark-job-server. Within the log directory you will find the general spark-job-server.log file and for each context a folder with a spark-job-server.log file. Have a look at these files to see what the problem might be.

You can also try to shutdown the Job Server, delete all files in the temporary directory under /tmp/spark-job-server, or whichever file system locations you have set in environment.conf and then restart the Job Server.

Bye

Tobias

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.