Azure compatibility

cjuran · June 29, 2016, 6:02pm

I'm not sure if this is the right place to post this; I wanted it in the forum for the Big Data Extension but couldn't figure it out.

I've got a Microsoft Azure account. On Azure I have a Spark cluster deployed that I use to do my data processing. I'd like to integrate KNIME into my work, but I'm not sure exactly how I would connect it to my Spark cluster. I read somewhere that the Spark connector is only compatible up to version 1.3, but perhaps there is another way to run queries without needing a cluster through Hadoop? I know the Cloud Analytics platform has recently been introduced onto Azure, but I don't quite know if that's capable of connecting to my blob storage or metastore.

Any help is appreciated!

bjoern.lohrmann · July 1, 2016, 10:44am

Hi cjuran,

this thread from earlier partly addresses your questions:

https://tech.knime.org/forum/big-data-extensions/hdinsight-compatibility

Currently, we still do not have support for HDInsight (yet). Next week we will release KNIME Spark Executor with Spark 1.5 and 1.6 support, which is necessary (but not yet sufficient) for Azure HDInsight. However, it will still be necessary to install the "Spark jobserver" (provided by us) on a VM that can access your cluster (that part we will figure out after the release). To my understanding, it is possible to access the Azure Blobstore through the Hadoop HDFS interface.

If you don't need Spark, the "Hive Connector" (sold as part of the KNIME Big Data Connectors) works with the HIve in HDInsight.

You also mentioned the KNIME Cloud Analytics platform. This is essentially KNIME Analytics Platform (the GUI) prepackaged to run on a Azure virtual machine. This does not yet give you access to the Azure blobstore or metastore. We are currently looking at what type of connectors we can make for the Azure platform.

Best,

Björn