Livy job server for Spark

Hi,

I understand that KNIME, at present, is only officially compatible with Spark 1.2/1.3.

I'm running an Azure HDInsight Spark Cluster with the choice between Windows (Spark 1.3) + Spark Jobserver and Linux (Spark 1.5) + Livy. I'm currently unable to find the details necessary to connect the Spark nodes up to my Windows cluster, and it seems like KNIME isn't quite ready for supporting Livy.

Are there any plans to have the Spark nodes work with Livy? Or am I missing something?

If not, how can I go about finding the correct details for the Jobserver to input into KNIME's configuration?

Alternatively, does anyone have any experience with installing jobserver on an Azure cluster and would this help me?

Thanks,

Matt

Hi Matt,

the KNIME Spark Executor currently does not work with Livy and currently we do not have any plans to do so. Because Livy is designed to be used interactively in a notebook-style fashion, it currently lacks some functionality that we need to make it work easily with KNIME (e.g. jar file management, named RDDs). As Livy evolves, this may change however.

Concerning the Spark jobserver: We are currently shipping our own Spark jobserver (much like Spark, this is a heavily evolving project), so KNIME is quite probably not compatible with the Spark jobserver shipped in Azure, but it would be worth a try. The easiest way to go here, is to create a Linux virtual machine on Azure (D1/D1 v2 should be enough, RAM is more important than disk here) and install "our" Spark jobserver (see [1]) on there. In that machine you will also have to

- install the Spark (try the open-source 1.3.1 from spark.apache.org).

- provide the xml configurations necessary for the jobserver to connect to the HDInsight Hadoop components (YARN, HDFS, Hive) into a directory and in the jobserver's settings.sh, set HADOOP_CONF_DIR to this directory.

Reference:

[1] Use the jobserver for HDP 2.3 from here: https://www.knime.org/knime-spark-executor#install

Hi,

Unfortunately the Linux version of HDInsight only comes with Spark version 1.5.2. It seems that the job server you have packaged up for HDP 2.2 and 2.3 (Spark 1.2 and 1.3 respectively) can be found at https://github.com/spark-jobserver/spark-jobserver and has since been updated for compatibility with newer versions of Spark. In particular, release 0.6.1 is built for Spark 1.5.2.

Is there a way to package this up in the way that you have done with older versions so that I could follow the above instructions and get KNIME up and running with Spark on HDInsight?

Many thanks,

Matt

Hi Matt,

we plan to update "our" Spark jobserver build with the new KNIME Spark Executor. Both the KNIME Spark Executor *and* the Spark jobserver must support Spark 1.5, so you cannot just run a new Spark jobserver with an old KNIME Spark Executor. This is something we are working on right now. If all goes well, we will release this end of April. If Azure is still on Spark 1.5 by then, this should work.

Hop this helps,

Björn