We are trying to configure KNIME Spark nodes in Azure environment using HD Insights and Spark Livy - however we are facing some challenges with the very initial step: Create Spark Context (Livy)
- Microsoft Azure environment
- Azure HD Insights using Sprak 2.4 (with Spark Livy)
- Linux VM within the same Azure subscription as Azure HD Insight with KNIME Analytics Platform (Desktop) and Big Data nodes
What we would like to achive
- Connect from KNIME Desktop to Spark coming with Azure HD Insight without Kerberos
Our current KNIME workflow consists from two nodes only: Azure Blob Storage node, pointing to HD Insight storage, and a Create Spark Context(Livy) node which has input parameter the Blob Storage node. On the Spark Context node, we can successfully select the proper folder from blob storage as a staging area, but the tricky point is the Livy URL.
We can access the Livy shipped with HD Insight using the https://spark_cluster_name.azurehdinsight.net/livy/ or by adding the 443 port to the URL which is required by the node.
Since the Spark context has two options for authentication: None and Kerberos while Livy shipped with HD Insights requires user name + password we are not sure how to configure it, as whatever URL we pass to node (interal, external IP, host name of the HD Insight Spark head clusters, the actually Livy URL in the Azure recommended format (https://spark_cluster_name.azurehdinsight.net:443/livy/) the connectivity cannot be established.
Thank you in advance,