Hadoop cluster on own server

Hi all!

I need advice about organizing a cluster and a server for it.
My organization cannot host information on third-party servers or rent their capacity, so there was a need to build my own server under Apache Spark.

I read the instructions, but still didn’t really understand how everything works. (https://docs.knime.com/2018-12/bigdata_spark_installation_guide/bigdata_spark_installation_guide.pdf)

I would like to connect using KNIME Extension for Apache Spark to own spark cluster. What do I need to do to create an Apache Livy service?

Hi @greatvarona,

You can try Cloudera, that contains everything like HDFS, Spark and Livy already, but requires a license after some testing time: CDP Private Cloud - Trial Product Download | Cloudera

If you like to install all this stuff by yourself, here is a getting started guide: Livy - Getting Started
But be prepared, that the last official Livy release was some time ago, and you might need to build a new version by yourself.

Not sure if there are other open-source distributions.

Cheers,
Sascha

3 Likes

Sascha,
Thanks for your response!

Hi @sascha.wolke,

We installed Hadoop, Spark and Livy on the server. What’s next?
Do I need to create a Databricks cluster?

How can I connect to this server through node “Create databricks environment”?

Hi @greatvarona,

You can use your own cluster with Haddop/Spark/Livy or a Databricks cluster that provides everything you need. Depending on this, you need the Create Spark Context (Livy) or the Create Databricks Environment node.

Cheers,
Sascha

2 Likes

@sascha.wolke Hello!

Can you help with it?

We are trying install Spark - Pre-built for Apache Hadoop 3.2 and later (with Scala 2.13)

But we got error:
ERROR Create Spark Context (Livy) 3:3 Execute failed: scala.Serializable (java.lang.ClassNotFoundException) (Exception)

And trying install Spark - Pre-built for Apache Hadoop 3.2 and later, but got another error:
ERROR Create Spark Context (Livy) 3:3 Execute failed: scala.Function0$class (java.lang.ClassNotFoundException) (Exception)

What we should to do for correct work of cluster?

Hi @greatvarona,

Can you try the Version with Scala 2.12? (this is the one at the top without the scala version, in your screenshot)

Cheers

@sascha.wolke
Hello!
Can you help me with versions of Hadoop, Livy and Scala for Apache Spark 3.2?

Hi @greatvarona,

At the download page of Apache Spark, select Spark release 3.2.4, and at the package type the first one (Pre-built for Apache Hadoop 3.2 and later).

Not sure about Livy, you might need to compile the current master using the -Pspark3 and -Pscala-2.12 flags: GitHub - apache/incubator-livy: Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

Cheers,
Sascha

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.