HDFS / WebHDFS connection to HDP through KNOX

I need to access HDFS or WebHDFS through KNOX, to a HDP platform.

I don´t see how to achieve this with WebHDFS Connection or HDFS Connection nodes.

I’ve already checked this post:

Thanks,
Fernando

Hi @peleitor

I am afraid there is currently no KNIME support for HDFS access via KNOX.

The technical reason is as follows:
Our HDFS/webHDFS/httpFS Connector nodes are using the standard Hadoop libraries (from hadoop.apache.org) to access HDFS. The problem seems to be that some aspects of the KNOX REST API are designed in a way that is incompatible with those Hadoop libraries.

I can see three workarounds:

  • Set up the httpFS service in your cluster and connect to it using the “httpFS Connection” node. This is what I would recommend.

  • Use Java Snippet with the knoxshell library, provided by the KNOX project. I have attached a sample workflow that uploads some files to HDFS via KNOX.

    About the workflow: The upper Table Creator node specifies the KNOX gateway URL, username and password. The lower Table Creator node specifies paths of local files and the remote HDFS folder to upload them to.

    After importing the workflow into KNIME you still need to download knoxshell and copy a jar file into the workflow folder. Download knoxshell from here:
    https://www.apache.org/dyn/closer.cgi/knox/1.2.0/knoxshell-1.2.0.zip
    Unzip the file, locate knoxshell.jar in the bin subfolder and copy knoxshell.jar into the directory of the imported KNIME workflow (inside your KNIME workspace).

  • You could use the REST and JSON parsing nodes as a workaround. This will be inconvenient, since you have to get into the details of how webHDFS works, but is probably doable.

Björn

knox_hdfs.knwf (8.9 KB)

Thanks Bjorn.

How do I specify certification path in your example?

I am getting this error:

ERROR Java Snippet 4:21 Execute failed: Calculation aborted: org.apache.knox.gateway.shell.KnoxShellException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target