I'm hoping to purchase the big data extensions and spark executor nodes for KNIME, but I want to make sure that they will be compatible with Hadoop, Hive and Spark on Microsoft's Azure platform... in particular HDInsight. Does anyone know of any reason why this might be different from one of the listed compatible platforms?
thank you for your interest in the KNIME Big Data Extensions! To be fully honest, we have not tested them against HDInsight, but according to Microsoft's product documentation, HDInsight simply uses Hortonworks under the hood:
The Hive and Spark versions in HdInsight 3.3 and 3.2 listed in the above link are compatible with the KNIME Big Data Extensions. Please note however, that you have to install the Spark Jobserver (see KNIME Spark Executor installation instructions) on a Linux machine that has access to your cluster, in addition to the extensions in KNIME Analytics Platform/Server. To verify that everything works, we offer free evaluation licenses. Just drop us an email if you are interested =)
And: Merry Chirstmas! :)
Many thanks for your help back in December. I'm in a position to evaluate the nodes, but before I do I'm hoping for a little guidance.
We're storing a large amount of data in Azure Blob storage which I understand will be accessible to the HDInsight Spark Cluster. I understand how the Spark nodes will execute on the cluster, but I am struggling to work out how I will use KNIME to convert my data to an RDD that can be used by KNIME. I'm not planning on using Hive, but if this is the only way then I should probably reconfigure the cluster!
hm, so it seems there is a way to access the Azure blob store through the HDFS interface (also the one in Spark I guess):
To do that, you currently need to use the "Spark Java Snippet Source" node. If you open the configuration dialog you find a code templates to access HDFS files. If I understand the above link correctly, you have to replace the hdfs:// Url with an Azure avs:// or wasb:// url.
is the HDInsight compatibility verified?
Azure will offer Cloudera as an alternative to Hortonworks. I assume this will work also.
the key issue here are the Spark versions. Since Azure HDInsight has moved on to Spark 1.5, at this moment it will not work with the KNIME Spark Executor (which supports Spark 1.2/1.3). However, this is something we are working on right now. If all goes well, we will release Spark 1.5 support end of April (If Azure is still on Spark 1.5 by then, this should work).