Problems creating a H2O context within a Spark context

Hi.

I’m trying to run a KNIME workflow in a spark cluster. I managed to connect to HDFS and create a Livy spark context in my cluster, but when I try to create a H2O context in it, it fails with the error:

2019-08-14 12:05:47,200 : DEBUG : KNIME-Worker-13 : %J : Node : Create H2O Sparkling Water Context : 0:3 : Execute failed: org.eclipse.jetty.server.Server.setSendServerVersion(Z)V (NoSuchMethodError)
java.lang.NoSuchMethodError: org.eclipse.jetty.server.Server.setSendServerVersion(Z)V
at water.AbstractHTTPD.makeServer(AbstractHTTPD.java:203)
at water.AbstractHTTPD.startHttp(AbstractHTTPD.java:208)
at water.AbstractHTTPD.start(AbstractHTTPD.java:96)
at water.init.NetworkInit.initializeNetworkSockets(NetworkInit.java:87)
at water.H2O.startLocalNode(H2O.java:1523)
at water.H2O.main(H2O.java:1990)
at water.H2OStarter.start(H2OStarter.java:21)
at water.H2OStarter.start(H2OStarter.java:46)
at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:121)
at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:130)
at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:401)
at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:417)
at org.apache.spark.h2o.H2OContext.getOrCreate(H2OContext.scala)
at org.apache.spark.h2o.JavaH2OContext.getOrCreate(JavaH2OContext.java:238)
at org.knime.ext.h2o.spark.spark2_3.H2OSparkContextConnector2_3.createH2OContext(H2OSparkContextConnector2_3.java:117)
at org.knime.ext.h2o.spark.spark2_3.job.H2OSparkCreateContextJob.runJob(H2OSparkCreateContextJob.java:70)
at org.knime.ext.h2o.spark.spark2_3.job.H2OSparkCreateContextJob.runJob(H2OSparkCreateContextJob.java:1)
at org.knime.bigdata.spark2_3.base.LivySparkJob.call(LivySparkJob.java:91)
at org.knime.bigdata.spark2_3.base.LivySparkJob.call(LivySparkJob.java:1)
at org.apache.livy.rsc.driver.BypassJob.call(BypassJob.java:40)
at org.apache.livy.rsc.driver.BypassJob.call(BypassJob.java:27)
at org.apache.livy.rsc.driver.JobWrapper.call(JobWrapper.java:57)
at org.apache.livy.rsc.driver.BypassJobWrapper.call(BypassJobWrapper.java:42)
at org.apache.livy.rsc.driver.BypassJobWrapper.call(BypassJobWrapper.java:27)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Does anyone know what this may be about?

@TomasCardoso

Thanks for the stacktrace. Can you provide some more information on your cluster environment? (EMR/CDH/HDP? Which version?)

Best,
Björn

Hey, thanks for the reply

My cluster is a HDP, Apache Ambari 2.7.3.0.
My HDFS and MapReduce version is 3.1.1.
Spark2 is 2.3.2.

Hi @TomasCardoso

can you ensure that under File > Preferences > KNIME > H2O you have selected the H2O version 3.24.0.4? Please restart KNIME after changing the setting.

The H2O Sparkling Water integration needs to upload some H2O libraries to Spark and it picks their version based on this setting. Older H2O versions are unfortunately not compatible with HDP 3.

Best,
Björn

1 Like

Hi.

I just checked and the H2O version in the preferences is the 3.22.0.2, so maybe that’s the problem I’m having.

However, when I press the Select H2O version, it gives a drop-down list with the following options:
H2O Local Context (3.10.5.2)
H2O Local Context (3.22.0.2)
H2O Local Context (3.20.0.2).

I can’t place it as the 3.24.0.4…

Hi @TomasCardoso

3.24.0.4 was added with KNIME 4.0 (exactly for the reason of being compatible with HDP 3 and CDH 6).

I suppose you are on KNIME 3.7?

Björn

Oh yeah I am. I’ll have to upgrade it then