Connecting from local machine to HDFS on AWS

Hi Team,

I am trying to connect the KNIME Analytics Platform which is on my laptop to a Hadoop cluster on AWS to which I have access.

Usually, I access the Hadoop cluster by establishing a SSH connection through Putty utility and the AWS private key file which logs me into the AWS machine, and I invoke the HDFS commands there (hdfs dfs -ls etc.)

When I try to connect through KNIME, the HDFS connection node (under the Big Data Connectors) that standard parameters asked are the host, port and user credentials. It doesnt allow me to specify the AWS Key file etc.

How do I connect to a hdfs cluster which is on AWS - which node should I use.

Just to test out, I tried to connect to the remote machine through the SSH connector node and list the files and that works (since SSH connector accepts the key file). However, I cannot pass on the SSH connection as an input to HDFS connector to authenticate my connection. I think I can use the HDFS connector to connect to a cluster which is in the same network as my laptop, but not to a remote cluster requiring an SSH connection. Is that the case? Or, is there a way out?

I am attaching the screenshot of the workflows I used below this note. Please help.



Hi Santhana,

connecting with HDFS/WebHDFS requires a working DNS (on cluster AND client side), i prever HttpFS instead (this works much better with clusters). Does it work with HttpFS?

If not, what are you using on AWS, EMR or a custom setup and on what host do you connect via ssh?


I am having the same problem.

My hdfs is listening on port 54310 but unable to connect to it ecternally.

I am able to access it on aws console using localhost:54310 though and have opened the port in security.