Problem with Create Spark Context(Livy)

Hi,
I tried to create spark context with livy and a spark standalone cluster.
HDFS connection works, but “Create Spark Context” node gives this error:

ERROR DFSClient                       Failed to close inode 22332
ERROR Create Spark Context (Livy) 0:2        Execute failed: Remote file system upload test failed: File /apps/spark/warehouse/.knime-spark-staging-1d124189-6270-4880-a8f5-035dd2eba202/858518ff-a481-4843-a968-5e9a646c2970 could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
  at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
  at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:286)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2706)
  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

Please anyone help me.

Hi @GiuseppeM and welcome to the community!

The Create Spark Livy node requires a HDFS directory to exchange data between Livy and KNIME (staging area). Looks like the HDFS node is not reachable (could only be written to 0 of the 1 minReplication nodes). Can you try to use the HttpFS Connection node instead of the HDFS node? Does this help?

2 Likes

Thanks for the reply,

I can’t use the HttpFS Connection node because my Livy server runs on HDP Sandbox 3.0 on Docker.
Are there any other possible solutions with HDFS Connection?

Hi @GiuseppeM,

if you only like to try the Spark Nodes in KNIME, you can also use the Create Local Big Data Environment node without any cluster setup.

The HDP Sandbox running on Docker might be a problem. You have to make sure that the HDFS does not run in safe-mode and all data nodes listed on the name node web UI are reachable from your client machine using the IP/names listed in the web UI.

2 Likes

Hi,

I have necessity to use Create Spark Context(Livy).

Sorry my statement in the previous reply it’s wrong:
HttpFS server cannot be installed on HDP Sandbox 3.0 on Docker using Ambari,
but I solved by installing manually it from ssh connection to my Docker container hosting HDP.

With HttpFS Connection node I haven’t any problem, thank you so much for the help!