Connecting Spark nodes to Azure HD Insight

#1

Dear Community,

We are trying to configure KNIME Spark nodes in Azure environment using HD Insights and Spark Livy - however we are facing some challenges with the very initial step: Create Spark Context (Livy)

The Architecture

  • Microsoft Azure environment
  • Azure HD Insights using Sprak 2.4 (with Spark Livy)
  • Linux VM within the same Azure subscription as Azure HD Insight with KNIME Analytics Platform (Desktop) and Big Data nodes

What we would like to achive

  • Connect from KNIME Desktop to Spark coming with Azure HD Insight without Kerberos

Our current KNIME workflow consists from two nodes only: Azure Blob Storage node, pointing to HD Insight storage, and a Create Spark Context(Livy) node which has input parameter the Blob Storage node. On the Spark Context node, we can successfully select the proper folder from blob storage as a staging area, but the tricky point is the Livy URL.

We can access the Livy shipped with HD Insight using the https://spark_cluster_name.azurehdinsight.net/livy/ or by adding the 443 port to the URL which is required by the node.

Since the Spark context has two options for authentication: None and Kerberos while Livy shipped with HD Insights requires user name + password we are not sure how to configure it, as whatever URL we pass to node (interal, external IP, host name of the HD Insight Spark head clusters, the actually Livy URL in the Azure recommended format (https://spark_cluster_name.azurehdinsight.net:443/livy/) the connectivity cannot be established.

Thank you in advance,
Botond

0 Likes

#2

Hi,

the Create Spark Context (Livy) node supports only None and Kerberos authentication in the dialog, but you can add username+password in the URL field as a workaround. For user botond and password secret, add them separated by a colon (:) and a at symbol (@) to the URL: https://botond:secret@spark_cluster_name.azurehdinsight.net:443/livy/

Sascha

1 Like