HDFS access issue

As I already mentioned here, I am unable to download from an Hortonworks HDP cluster.

HDFS Connection node works fine, and also List Remote Files node.

However, Download node never successes.

KNIME version is 3.6, and timeout is set to 30 seconds (also tried with higher settings).

As I wrote in the other thread: Do you use the WebHDFS connector?

I had issues with the “HDFS Connection” Node, but the “WebHDFS Connection” worked.

Yes, already tried WebHDFS service on Hortonworks side.

Hi @peleitor

the problem you are having means, that you can connect to the NameNode but none or not all of the DataNodes in your cluster. In HDFS, when you want to download a file, your client (KNIME) first asks the NameNode for the DataNode(s) where the file contents are stored. Then the client tries to directly connect to the DataNodes for the download.

One possible cause for this is NAT [1] between you and the cluster. In a NAT’ed setup all your machines have a public IP/hostname and a private IP/hostname. The NameNode only knows the private IPs/hostnames and will refer the client to them for file up- and download. If the client cannot connect to the private IPs then this leads to the problem you are experiencing. A solution for the NAT problem is to put the client into the same private network as the cluster, e.g. using VPN.

Other causes are firewalls that prevent your client from making connections to the DataNodes.


[1] https://en.wikipedia.org/wiki/Network_address_translation

End of story: