Hadoop cluster on own server

Hi all!

I need advice about organizing a cluster and a server for it.
My organization cannot host information on third-party servers or rent their capacity, so there was a need to build my own server under Apache Spark.

I read the instructions, but still didn’t really understand how everything works. (https://docs.knime.com/2018-12/bigdata_spark_installation_guide/bigdata_spark_installation_guide.pdf)

I would like to connect using KNIME Extension for Apache Spark to own spark cluster. What do I need to do to create an Apache Livy service?

Hi @greatvarona,

You can try Cloudera, that contains everything like HDFS, Spark and Livy already, but requires a license after some testing time: CDP Private Cloud - Trial Product Download | Cloudera

If you like to install all this stuff by yourself, here is a getting started guide: Livy - Getting Started
But be prepared, that the last official Livy release was some time ago, and you might need to build a new version by yourself.

Not sure if there are other open-source distributions.



Thanks for your response!

Hi @sascha.wolke,

We installed Hadoop, Spark and Livy on the server. What’s next?
Do I need to create a Databricks cluster?

How can I connect to this server through node “Create databricks environment”?

Hi @greatvarona,

You can use your own cluster with Haddop/Spark/Livy or a Databricks cluster that provides everything you need. Depending on this, you need the Create Spark Context (Livy) or the Create Databricks Environment node.



@sascha.wolke Hello!

Can you help with it?

We are trying install Spark - Pre-built for Apache Hadoop 3.2 and later (with Scala 2.13)

But we got error:
ERROR Create Spark Context (Livy) 3:3 Execute failed: scala.Serializable (java.lang.ClassNotFoundException) (Exception)

And trying install Spark - Pre-built for Apache Hadoop 3.2 and later, but got another error:
ERROR Create Spark Context (Livy) 3:3 Execute failed: scala.Function0$class (java.lang.ClassNotFoundException) (Exception)

What we should to do for correct work of cluster?

Hi @greatvarona,

Can you try the Version with Scala 2.12? (this is the one at the top without the scala version, in your screenshot)


Can you help me with versions of Hadoop, Livy and Scala for Apache Spark 3.2?

Hi @greatvarona,

At the download page of Apache Spark, select Spark release 3.2.4, and at the package type the first one (Pre-built for Apache Hadoop 3.2 and later).

Not sure about Livy, you might need to compile the current master using the -Pspark3 and -Pscala-2.12 flags: GitHub - apache/incubator-livy: Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.


1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.