I am going to try KNIME's spark executor. My first action item is to install KNIME's customized version of spark-jobserver. The instructions on https://www.knime.org/knime-spark-executor recommend that we install spark-jobserver on our Hadoop edge node, but I can't do that because I do not own our Hadoop environment. I can however set up a separate Linux node. What exactly do I need to install on that separate Linux node to make it work with spark-jobserver? Is it sufficient to install Spark? Or is anything else needed? OurHadoop environment is HDP 2.5 with Spark 1.6.
my apologies for the late reply. The jobserver must be installed on a Linux machine that
- has full network connectivity to all of your cluster nodes
- can be connected to via HTTP (default port TCP/8090) from KNIME Analytics Platform and/or KNIME Server
- has all libraries and cluster-specific configuration Spark, Hadoop and Hive libraries set up.
Especially the latter condition means that it is best to configure the machine via Ambari, i.e. installing Spark client, HDFS client, YARN client and Hive client on that machine (which should be easy to do via Ambari).