I have been working with spark on my Mapr Clusters for a month... was working yesterday and I shut down my laptop, slept, wake up the next morning, and today always:
ERROR Create Spark Context 0:8 Execute failed: Error connecting to Spark Jobserver. Possible reasons: Spark Jobserver is down or invalid connection settings (for details see View > Open KNIME log)
I use the same spark context for almost a month, and suddenly today like this:
2017-11-22 15:28:06,982 : DEBUG : main : ExecuteAction : : : Creating execution job for 1 node(s)...
2017-11-22 15:28:06,982 : DEBUG : main : NodeContainer : : : Create Spark Context 0:8 has new state: CONFIGURED_MARKEDFOREXEC
2017-11-22 15:28:06,982 : DEBUG : main : NodeContainer : : : Create Spark Context 0:8 has new state: CONFIGURED_QUEUED
2017-11-22 15:28:06,983 : DEBUG : main : NodeContainer : : : CLV 0 has new state: EXECUTING
2017-11-22 15:28:06,983 : DEBUG : KNIME-Worker-19 : WorkflowManager : Create Spark Context : 0:8 : Create Spark Context 0:8 doBeforePreExecution
2017-11-22 15:28:06,983 : DEBUG : KNIME-Worker-19 : NodeContainer : Create Spark Context : 0:8 : Create Spark Context 0:8 has new state: PREEXECUTE
2017-11-22 15:28:06,983 : DEBUG : KNIME-Worker-19 : WorkflowManager : Create Spark Context : 0:8 : Create Spark Context 0:8 doBeforeExecution
2017-11-22 15:28:06,983 : DEBUG : KNIME-Worker-19 : NodeContainer : Create Spark Context : 0:8 : Create Spark Context 0:8 has new state: EXECUTING
2017-11-22 15:28:06,983 : DEBUG : KNIME-Worker-19 : WorkflowFileStoreHandlerRepository : Create Spark Context : 0:8 : Adding handler 9d1284a2-a615-40c5-bf01-5e8f1c5bb8c5 (Create Spark Context 0:8: <no directory>) - 2 in total
2017-11-22 15:28:06,983 : DEBUG : KNIME-Worker-19 : LocalNodeExecutionJob : Create Spark Context : 0:8 : Create Spark Context 0:8 Start execute
2017-11-22 15:28:06,983 : DEBUG : KNIME-WFM-Parent-Notifier : NodeContainer : : : ROOT has new state: EXECUTING
2017-11-22 15:28:06,984 : INFO : KNIME-Worker-19 : JobserverSparkContext : Create Spark Context : 0:8 : Spark context jobserver://maskedcluster:8585/knimeSparkContext changed status from CONFIGURED to CONFIGURED
2017-11-22 15:28:06,986 : DEBUG : KNIME-Worker-19 : JobserverSparkContext : Create Spark Context : 0:8 : Checking if remote context exists. Name: knimeSparkContext
2017-11-22 15:28:06,999 : DEBUG : KNIME-Worker-19 : JobserverSparkContext : Create Spark Context : 0:8 : Remote context does not exist. Name: knimeSparkContext
2017-11-22 15:28:06,999 : DEBUG : KNIME-Worker-19 : JobserverSparkContext : Create Spark Context : 0:8 : Creating new remote Spark context. Name: knimeSparkContext
2017-11-22 15:29:07,006 : INFO : KNIME-Worker-19 : JobserverSparkContext : Create Spark Context : 0:8 : Spark context jobserver://maskedcluster:8585/knimeSparkContext changed status from CONFIGURED to CONFIGURED
2017-11-22 15:29:07,007 : DEBUG : KNIME-Worker-19 : Create Spark Context : Create Spark Context : 0:8 : reset
2017-11-22 15:29:07,007 : DEBUG : KNIME-Worker-19 : SparkNodeModel : Create Spark Context : 0:8 : In reset() of SparkNodeModel. Calling deleteRDDs.
2017-11-22 15:29:07,007 : ERROR : KNIME-Worker-19 : Create Spark Context : Create Spark Context : 0:8 : Execute failed: Error connecting to Spark Jobserver. Possible reasons: Spark Jobserver is down or invalid connection settings (for details see View > Open KNIME log)
2017-11-22 15:29:07,007 : DEBUG : KNIME-Worker-19 : Create Spark Context : Create Spark Context : 0:8 : Execute failed: Error connecting to Spark Jobserver. Possible reasons: Spark Jobserver is down or invalid connection settings (for details see View > Open KNIME log)
javax.ws.rs.ProcessingException: java.net.SocketTimeoutException: SocketTimeoutException invoking http://maskedcluster:8585/contexts/knimeSparkContext: Read timed out
at org.apache.cxf.jaxrs.client.AbstractClient.checkClientException(AbstractClient.java:582)
at org.apache.cxf.jaxrs.client.AbstractClient.preProcessResult(AbstractClient.java:564)
at org.apache.cxf.jaxrs.client.WebClient.doResponse(WebClient.java:1144)
at org.apache.cxf.jaxrs.client.WebClient.doChainedInvocation(WebClient.java:1094)
at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:894)
at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:865)
at org.apache.cxf.jaxrs.client.WebClient.invoke(WebClient.java:428)
at org.apache.cxf.jaxrs.client.WebClient$SyncInvokerImpl.method(WebClient.java:1631)
at org.apache.cxf.jaxrs.client.WebClient$SyncInvokerImpl.method(WebClient.java:1626)
at org.apache.cxf.jaxrs.client.WebClient$SyncInvokerImpl.post(WebClient.java:1566)
at org.apache.cxf.jaxrs.client.spec.InvocationBuilderImpl.post(InvocationBuilderImpl.java:145)
at com.knime.bigdata.spark.core.context.jobserver.rest.WsRsRestClient.post(WsRsRestClient.java:210)
at com.knime.bigdata.spark.core.context.jobserver.rest.RestClient.post(RestClient.java:79)
at com.knime.bigdata.spark.core.context.jobserver.request.CreateContextRequest.sendInternal(CreateContextRequest.java:62)
at com.knime.bigdata.spark.core.context.jobserver.request.CreateContextRequest.sendInternal(CreateContextRequest.java:1)
at com.knime.bigdata.spark.core.context.jobserver.request.AbstractJobserverRequest.send(AbstractJobserverRequest.java:73)
at com.knime.bigdata.spark.core.context.jobserver.JobserverSparkContext.createRemoteSparkContext(JobserverSparkContext.java:466)
at com.knime.bigdata.spark.core.context.jobserver.JobserverSparkContext.access$4(JobserverSparkContext.java:460)
at com.knime.bigdata.spark.core.context.jobserver.JobserverSparkContext$1.run(JobserverSparkContext.java:243)
at com.knime.bigdata.spark.core.context.jobserver.JobserverSparkContext.runWithResetOnFailure(JobserverSparkContext.java:342)
at com.knime.bigdata.spark.core.context.jobserver.JobserverSparkContext.open(JobserverSparkContext.java:231)
at com.knime.bigdata.spark.core.context.SparkContext.ensureOpened(SparkContext.java:58)
at com.knime.bigdata.spark.node.util.context.create.SparkContextCreatorNodeModel.executeInternal(SparkContextCreatorNodeModel.java:156)
at com.knime.bigdata.spark.core.node.SparkNodeModel.execute(SparkNodeModel.java:235)
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:567)
at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1128)
at org.knime.core.node.Node.execute(Node.java:915)
at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:561)
at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)
at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:179)
at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:110)
at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:328)
at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:204)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
Caused by: java.net.SocketTimeoutException: SocketTimeoutException invoking http://maskedcluster:8585/contexts/knimeSparkContext: Read timed out
at sun.reflect.GeneratedConstructorAccessor106.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.mapException(HTTPConduit.java:1376)
at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.close(HTTPConduit.java:1360)
at org.apache.cxf.transport.AbstractConduit.close(AbstractConduit.java:56)
at org.apache.cxf.transport.http.HTTPConduit.close(HTTPConduit.java:651)
at org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:62)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at org.apache.cxf.jaxrs.client.AbstractClient.doRunInterceptorChain(AbstractClient.java:649)
at org.apache.cxf.jaxrs.client.WebClient.doChainedInvocation(WebClient.java:1093)
... 33 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1535)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at org.apache.cxf.transport.http.URLConnectionHTTPConduit$URLConnectionWrappedOutputStream.getResponseCode(URLConnectionHTTPConduit.java:332)
at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.doProcessResponseCode(HTTPConduit.java:1580)
at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.handleResponseInternal(HTTPConduit.java:1609)
at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.handleResponse(HTTPConduit.java:1550)
at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.close(HTTPConduit.java:1347)
... 39 more
2017-11-22 15:29:07,007 : DEBUG : KNIME-Worker-19 : WorkflowManager : Create Spark Context : 0:8 : Create Spark Context 0:8 doBeforePostExecution
2017-11-22 15:29:07,008 : DEBUG : KNIME-Worker-19 : NodeContainer : Create Spark Context : 0:8 : Create Spark Context 0:8 has new state: POSTEXECUTE
2017-11-22 15:29:07,011 : DEBUG : KNIME-Worker-19 : WorkflowManager : Create Spark Context : 0:8 : Create Spark Context 0:8 doAfterExecute - failure
2017-11-22 15:29:07,011 : DEBUG : KNIME-Worker-19 : Create Spark Context : Create Spark Context : 0:8 : reset
2017-11-22 15:29:07,011 : DEBUG : KNIME-Worker-19 : SparkNodeModel : Create Spark Context : 0:8 : In reset() of SparkNodeModel. Calling deleteRDDs.
2017-11-22 15:29:07,011 : DEBUG : KNIME-Worker-19 : Create Spark Context : Create Spark Context : 0:8 : clean output ports.
2017-11-22 15:29:07,011 : DEBUG : KNIME-Worker-19 : WorkflowFileStoreHandlerRepository : Create Spark Context : 0:8 : Removing handler 9d1284a2-a615-40c5-bf01-5e8f1c5bb8c5 (Create Spark Context 0:8: <no directory>) - 1 remaining
2017-11-22 15:29:07,011 : DEBUG : KNIME-Worker-19 : NodeContainer : Create Spark Context : 0:8 : Create Spark Context 0:8 has new state: IDLE
2017-11-22 15:29:07,012 : DEBUG : KNIME-Worker-19 : SparkContextCreatorNodeModel : Create Spark Context : 0:8 : Reconfiguring old context with same ID.
2017-11-22 15:29:07,012 : DEBUG : KNIME-Worker-19 : Create Spark Context : Create Spark Context : 0:8 : Configure succeeded. (Create Spark Context)
2017-11-22 15:29:07,012 : DEBUG : KNIME-Worker-19 : NodeContainer : Create Spark Context : 0:8 : Create Spark Context 0:8 has new state: CONFIGURED
2017-11-22 15:29:07,012 : DEBUG : KNIME-Worker-19 : NodeContainer : Create Spark Context : 0:8 : CLV 0 has new state: IDLE
2017-11-22 15:29:07,014 : DEBUG : KNIME-WFM-Parent-Notifier : NodeContainer : : : ROOT has new state: IDLE
Checked the cluster, the spark job server is running. Restarted, and still like above.
Any idea?