Unable to create Spark Context (Livy)

Hello everyone,

I’m sorry if I am writing in the wrong topic, but I have the following issue:
I have created a Create Spark Context (Livy) node and I am trying to connect it to HDFS cluster managed by Cloudera.
I have made all the settings for Spark Job server and Livy URL (hope so) and when I try to execute the node, it creates a livy session (checked in YARN), it allocates the configured resources from the node, but after that I get the following error:
“ERROR Create Spark Context (Livy) 3:30 Execute failed: Broken pipe (Write failed) (SocketException)”

Here are some YARN logs:

" 21/12/22 17:58:34 INFO spark.SparkContext: Added JAR hdfs://dlhdfs/user/hdfs/.livy-sessions/8ae05520-85c5-4995-84cf-21c5d69aa4ce/livy-kryo-version-detector.jar at hdfs://dlhdfs/user/hdfs/.livy-sessions/8ae05520-85c5-4995-84cf-21c5d69aa4ce/livy-kryo-version-detector.jar with timestamp 1640188714335
21/12/22 17:58:34 INFO driver.RSCDriver: Received bypass job request 4f1d3ebd-3d69-478c-92e6-1a621a48867b
21/12/22 17:58:35 INFO atlas.SparkAtlasEventTracker: Receiving application end event - shutting down SAC
21/12/22 17:58:35 INFO atlas.SparkAtlasEventTracker: Done shutting down SAC
21/12/22 17:58:35 INFO server.AbstractConnector: Stopped Spark@3aa26dd6{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
21/12/22 17:58:35 INFO ui.SparkUI: Stopped Spark web UI at http://cm-worker13-all-prod.emag.network:37488
21/12/22 17:58:35 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
21/12/22 17:58:35 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
21/12/22 17:58:35 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
21/12/22 17:58:35 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false)
21/12/22 17:58:35 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/12/22 17:58:35 INFO memory.MemoryStore: MemoryStore cleared
21/12/22 17:58:35 INFO storage.BlockManager: BlockManager stopped
21/12/22 17:58:35 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
21/12/22 17:58:35 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/12/22 17:58:35 INFO spark.SparkContext: Successfully stopped SparkContext
21/12/22 17:58:35 INFO repl.PythonInterpreter: Shutting down process
21/12/22 17:58:35 INFO spark.SparkContext: SparkContext already stopped.
21/12/22 17:58:35 INFO repl.PythonInterpreter: process has been shut down
21/12/22 17:58:35 INFO spark.SparkContext: SparkContext already stopped.
21/12/22 17:58:35 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
21/12/22 17:58:35 WARN channel.AbstractChannelHandlerContext: An exception ‘java.lang.IllegalArgumentException: not existed channel:[id: 0x18ece6d4, L:0.0.0.0/0.0.0.0:10000 ! R:/10.96.116.28:49662]’ [enable DEBUG level for full stacktrace] was thrown by a user handler’s exceptionCaught() method while handling the following exception: java.lang.IllegalArgumentException: not existed channel:[id: 0x18ece6d4, L:0.0.0.0/0.0.0.0:10000 ! R:/10.96.116.28:49662] "

And here are the logs from Knime app (Knime Analytics Platform):

" 2021-12-22 18:02:49,579 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Creating new remote Spark context sparkLivy://dfe62d7e-a250-41a3-9601-1bac31379ffb at https://cm-master1-all-prod.emag.network:8998 with authentication KERBEROS.
2021-12-22 18:02:49,585 : INFO : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : SparkContext : Create Spark Context (Livy) : 3:30 : Spark context sparkLivy://dfe62d7e-a250-41a3-9601-1bac31379ffb changed status from CONFIGURED to OPEN
2021-12-22 18:03:21,097 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Uploading Kryo version detector job jar.
2021-12-22 18:03:21,801 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Running Kryo version detector job jar.
2021-12-22 18:03:22,400 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Using Kryo serializer version: kryo2
2021-12-22 18:03:22,400 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Uploading job jar: /var/folders/13/8vjx3pj137l2qrh_xwpnqwqc0000gp/T/sparkClasses16857522895332955503.jar
2021-12-22 18:03:22,423 : INFO : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Destroying Livy Spark context
2021-12-22 18:03:23,150 : INFO : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : SparkContext : Create Spark Context (Livy) : 3:30 : Spark context sparkLivy://dfe62d7e-a250-41a3-9601-1bac31379ffb changed status from OPEN to CONFIGURED
2021-12-22 18:03:23,151 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Create Spark Context (Livy) : 3:30 : reset
2021-12-22 18:03:23,151 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : SparkNodeModel : Create Spark Context (Livy) : 3:30 : In reset() of SparkNodeModel. Calling deleteSparkDataObjects.
2021-12-22 18:03:23,152 : ERROR : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Create Spark Context (Livy) : 3:30 : Execute failed: Broken pipe (Write failed) (SocketException)
java.util.concurrent.ExecutionException: java.net.SocketException: Broken pipe (Write failed)
at java.base/java.util.concurrent.FutureTask.report(Unknown Source)
at java.base/java.util.concurrent.FutureTask.get(Unknown Source)
at org.knime.bigdata.spark.core.livy.context.LivySparkContext.waitForFuture(LivySparkContext.java:492)
at org.knime.bigdata.spark.core.livy.context.LivySparkContext.uploadJobJar(LivySparkContext.java:464)
at org.knime.bigdata.spark.core.livy.context.LivySparkContext.open(LivySparkContext.java:327)
at org.knime.bigdata.spark.core.context.SparkContext.ensureOpened(SparkContext.java:145)
at org.knime.bigdata.spark.core.livy.node.create.LivySparkContextCreatorNodeModel2.executeInternal(LivySparkContextCreatorNodeModel2.java:85)
at org.knime.bigdata.spark.core.node.SparkNodeModel.execute(SparkNodeModel.java:240)
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:549)
at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1267)
at org.knime.core.node.Node.execute(Node.java:1041)
at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:559)
at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)
at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:201)
at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:117)
at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:365)
at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:219)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
at java.base/java.net.SocketOutputStream.socketWrite(Unknown Source)
at java.base/java.net.SocketOutputStream.write(Unknown Source)
at java.base/sun.security.ssl.SSLSocketOutputRecord.deliver(Unknown Source)
at java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(Unknown Source)
at org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:124)
at org.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:136)
at org.apache.http.impl.io.SessionOutputBufferImpl.write(SessionOutputBufferImpl.java:167)
at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:113)
at org.apache.http.entity.mime.content.FileBody.writeTo(FileBody.java:121)
at org.apache.http.entity.mime.AbstractMultipartForm.doWriteTo(AbstractMultipartForm.java:134)
at org.apache.http.entity.mime.AbstractMultipartForm.writeTo(AbstractMultipartForm.java:157)
at org.apache.http.entity.mime.MultipartFormEntity.writeTo(MultipartFormEntity.java:113)
at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156)
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at org.apache.livy.client.http.LivyConnection.executeRequest(LivyConnection.java:292)
at org.apache.livy.client.http.LivyConnection.access$000(LivyConnection.java:68)
at org.apache.livy.client.http.LivyConnection$3.run(LivyConnection.java:277)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.livy.client.http.LivyConnection.sendRequest(LivyConnection.java:274)
at org.apache.livy.client.http.LivyConnection.post(LivyConnection.java:228)
at org.apache.livy.client.http.HttpClient$3.call(HttpClient.java:256)
at org.apache.livy.client.http.HttpClient$3.call(HttpClient.java:253)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
2021-12-22 18:03:23,152 : DEBUG : pool-7-thread-1 : : DestroyAndDisposeSparkContextTask : : : Destroying and disposing Spark context: sparkLivy://dfe62d7e-a250-41a3-9601-1bac31379ffb
2021-12-22 18:03:23,153 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : WorkflowManager : Create Spark Context (Livy) : 3:30 : Create Spark Context (Livy) 3:30 doBeforePostExecution
2021-12-22 18:03:23,154 : INFO : pool-7-thread-1 : : LivySparkContext : : : Destroying Livy Spark context
2021-12-22 18:03:23,155 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : Create Spark Context (Livy) 3:30 has new state: POSTEXECUTE
2021-12-22 18:03:23,155 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : WorkflowManager : Create Spark Context (Livy) : 3:30 : Create Spark Context (Livy) 3:30 doAfterExecute - failure
2021-12-22 18:03:23,155 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : Parquet to Spark 3:32 has new state: CONFIGURED
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : Spark to Parquet 3:37 has new state: IDLE
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Create Spark Context (Livy) : 3:30 : reset
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : SparkNodeModel : Create Spark Context (Livy) : 3:30 : In reset() of SparkNodeModel. Calling deleteSparkDataObjects.
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Create Spark Context (Livy) : 3:30 : clean output ports.
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : WorkflowDataRepository : Create Spark Context (Livy) : 3:30 : Removing handler 9c1d8004-908d-43fe-8347-45ba78bd7c58 (Create Spark Context (Livy) 3:30: ) - 5 remaining
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : Create Spark Context (Livy) 3:30 has new state: IDLE
2021-12-22 18:03:23,156 : DEBUG : pool-7-thread-1 : : DestroyAndDisposeSparkContextTask : : : Destroying and disposing Spark context: sparkLivy://bf46d062-17db-4ec1-8d06-28b53da7c624
2021-12-22 18:03:23,157 : INFO : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : SparkContext : Create Spark Context (Livy) : 3:30 : Spark context sparkLivy://f5354b5c-b44d-4460-b270-17f0ea238f4b changed status from NEW to CONFIGURED
2021-12-22 18:03:23,157 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Create Spark Context (Livy) : 3:30 : Configure succeeded. (Create Spark Context (Livy))
2021-12-22 18:03:23,157 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : Create Spark Context (Livy) 3:30 has new state: CONFIGURED
2021-12-22 18:03:23,157 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Parquet to Spark : 3:32 : Configure succeeded. (Parquet to Spark)
2021-12-22 18:03:23,157 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : HDFS 3 has new state: IDLE
2021-12-22 18:03:42,421 : DEBUG : main : : NodeContainerEditPart : : : Spark to Parquet 3:37 (IDLE)
2021-12-22 18:03:42,424 : DEBUG : main : : NodeContainerEditPart : : : Parquet to Spark 3:32 (CONFIGURED)
2021-12-22 18:03:44,073 : DEBUG : main : : NodeContainerEditPart : : : Parquet to Spark 3:32 (CONFIGURED)
2021-12-22 18:03:44,073 : DEBUG : main : : NodeContainerEditPart : : : Spark to Parquet 3:37 (IDLE)
2021-12-22 18:53:19,830 : INFO : main : : SparkContext : : : Spark context jobserver://cm-master3-all-prod.emag.network:8090/knimeSparkContext changed status from NEW to CONFIGURED
2021-12-22 18:53:23,611 : DEBUG : main : : NodeContainerEditPart : : : Spark to Parquet 3:37 (IDLE)
2021-12-22 18:53:23,613 : DEBUG : main : : NodeContainerEditPart : : : Create Spark Context (Livy) 3:30 (CONFIGURED)

Please let me know if any more info are needed.

Thank you in advance,
Andrei

1 Like

Does anyone know what is the problem?

Hi @andreis and welcome to the KNIME community!

Do you always get this error? Not sure if your cluster or local machine has a wrong time or the logs are from different tests. Can you post the Livy logs? You can find them in the Spark Service in the Cloudera Manager.

Cheers

1 Like

Hi @sascha.wolke,

Thanks for the welcoming.
Unfortunately, I get that error everytime I run the “Create Spark Context(Livy)” node.

Here are the Livy logs: Livy logs.docx (12.9 KB)

@andreis a few remarks. I am not an expert in the setup of Cloudera systems but I noticed a few things.

21/12/29 12:10:14 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.

This might indicate that a Spark session for the user is already running and from my understanding the configuration on the server side would either create a new session for every instance that would be started vin Livy from KNIME or try to ‘access’ and existing one. From my experience it is best to provide every KNIME/Livy instance with its own session. So it might be there is a broken session and that is blocking the usw. You could try and immediately after establishing the Livy connection (if that is possible) Destroy the Spark context and try again. Or you will have to talk to your admins about allowing more sessions.

Then there is mentioning of CDH 7.x- I was under the impression that version 6.x is the one currently used for production. Could it be this is some sort of beta or test server?

Then for a big data system the resources are not particularly impressive, You might have to toy around with that and allow some more memory and nodes. You can also set a flexible number of nodes. I usually start with 3 nodes and then allow up to 10 or 15 - and let the system decide dynamically. Of course this comes also at a price since the management of the nodes/clusters would take time and resources (no free lunch here either). Also some things like H2O.ai Sparkling Water would only work with fixed number of nodes.

Then there is this waiting time. It is possible to increase settings that would allow the system more time to start up a cluster. Not 100% sure fro the log; but it seems the cluster is started and then barely makes it ‘in time’ (might be the attempt to connect to the running instance). So maybe try and give it some more time (check out the settings I do not have them at hand).

YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)

21/12/29 12:10:00 INFO driver.SparkEntries: Spark context finished initialization in 30882ms

OK and from here things go south:

21/12/29 12:10:21 INFO driver.RSCDriver: Received bypass job request ee3a4368-1d5e-4013-bee6-8131ac270790
21/12/29 12:10:23 INFO repl.PythonInterpreter: Shutting down process

Not sure what this means but it could be a problem with the existing Spark context. And also it might be necessary to check the Python setup on the server.

Some entries might hint at problems with memory allocation on the server.

These are my thoughts

1 Like

Hi @andreis,

have never seen this not existed channel errors before. Not sure if this is a network configuration problem or the YARN container crashes. Maybe the YARN container logs contains more information. You can find them in the YARN Resource Manager Web UI.

1 Like

Hi @andreis,

can you shortly summarize your setup?

  • Cloudera Version of your cluster
  • Spark Version (included 2.4 or extra parcel)
  • SSL encryption enabled?
  • Are you running the test from a local KNIME AP or using a KNIME Server
  • What operating system are you using

Right now, I’m testing a CDH 7.1.7 cluster with Spark 2.4 and it looks like something with the Jetty SSL encryption goes wrong like in this bug. Without SSL or after upgrading to jetty-io-9.4.40.v20210413 everything seems to work.

1 Like