HDFS connection error

I used ssh connection to upload my data to HPC system, and then I tried to use json to spark node to process big data. However, I can use ‘ssh connection’ node to connect my HPC host. But I got error ‘Failed on local exception: java.io.IOException: An exsiting connection was forcibly closed by remote host’ when I use same configuration than ssh connection in HDFS connection. I also tried to use webHDFS connection and HttpFS connection, and I failed and got ‘Connection reset’ message. How to solve this problem? Since for now only ssh connection works for me, so is it possible to use ssh connection for processing spark data (like json to spark, text to spark etc)?

1 Like

Hello DerekJin,
what is your HPC system? Does it support HDFS?
In the worst case you can always resort to the PySpark Script Source or Spark DataFrame Java Snippet (Source) node to read data from any file system Spark supports.
Bye
Tobias

Thank you for your information. I will try to use java snippet to process spark data, since for now ssh is only supported connection by my HPC system.