Hi Team,
I am trying to connect the KNIME Analytics Platform which is on my laptop to a Hadoop cluster on AWS to which I have access.
Usually, I access the Hadoop cluster by establishing a SSH connection through Putty utility and the AWS private key file which logs me into the AWS machine, and I invoke the HDFS commands there (hdfs dfs -ls etc.)
When I try to connect through KNIME, the HDFS connection node (under the Big Data Connectors) that standard parameters asked are the host, port and user credentials. It doesnt allow me to specify the AWS Key file etc.
How do I connect to a hdfs cluster which is on AWS - which node should I use.
Just to test out, I tried to connect to the remote machine through the SSH connector node and list the files and that works (since SSH connector accepts the key file). However, I cannot pass on the SSH connection as an input to HDFS connector to authenticate my connection. I think I can use the HDFS connector to connect to a cluster which is in the same network as my laptop, but not to a remote cluster requiring an SSH connection. Is that the case? Or, is there a way out?
I am attaching the screenshot of the workflows I used below this note. Please help.
Thanks,
Santhana