H2O Random Forest Learner error

mihais1 · July 17, 2020, 4:56pm

Hi,

I created an H2O Sparkling Water Context over a Spark Livy Context using only KNIME nodes on GCE Dataproc and I have this error at the H2O Random Forest Learner:

“ERROR : KNIME-Worker-29-H2O Random Forest Learner 3:2688 : : Node : H2O Random Forest Learner : 3:2688 : Execute failed: Job crashed unexpected. Cause: org.knime.bigdata.spark.core.exception.KNIMESparkException: error cannot be computed: too many classes (UnsupportedOperationException) See log for details.
org.knime.ext.h2o.exception.H2OJobCrashedException: Job crashed unexpected. Cause: org.knime.bigdata.spark.core.exception.KNIMESparkException: error cannot be computed: too many classes (UnsupportedOperationException) See log for details.
at org.knime.ext.h2o.jobs.DefaultH2OJobFuture.get(DefaultH2OJobFuture.java:98)
at org.knime.ext.h2o.jobs.AbstractH2OExecutionContext.submit(AbstractH2OExecutionContext.java:76)
at org.knime.ext.h2o.context.DefaultH2OSession.futureOf(DefaultH2OSession.java:213)
at org.knime.ext.h2o.context.DefaultH2OSession.run(DefaultH2OSession.java:369)
at org.knime.ext.h2o.nodes.learner.drf.H2ODRFNodeModel3.run(H2ODRFNodeModel3.java:86)
at org.knime.ext.h2o.nodes.learner.drf.H2ODRFNodeModel3.run(H2ODRFNodeModel3.java:1)
at org.knime.ext.h2o.nodes.AbstractH2OSupervisedNodeModel.execute(AbstractH2OSupervisedNodeModel.java:157)
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:576)
at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1236)
at org.knime.core.node.Node.execute(Node.java:1016)
at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:558)
at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)
at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:201)
at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:117)
at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:334)
at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:210)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
Caused by: java.util.concurrent.ExecutionException: org.knime.bigdata.spark.core.exception.KNIMESparkException: error cannot be computed: too many classes (UnsupportedOperationException)
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.knime.ext.h2o.spark.H2OSparkJobFactory$H2OSimpleSparkJob$1.checkException(H2OSparkJobFactory.java:221)
at org.knime.ext.h2o.spark.H2OSparkJobFactory$H2OSimpleSparkJob$1.(H2OSparkJobFactory.java:175)
at org.knime.ext.h2o.spark.H2OSparkJobFactory$H2OSimpleSparkJob.getStatus(H2OSparkJobFactory.java:173)
at org.knime.ext.h2o.jobs.DefaultH2OJobFuture.get(DefaultH2OJobFuture.java:87)
… 19 more
Caused by: org.knime.bigdata.spark.core.exception.KNIMESparkException: error cannot be computed: too many classes (UnsupportedOperationException)
at org.knime.bigdata.spark2_4.base.LivySparkJob.call(LivySparkJob.java:106)
at org.knime.bigdata.spark2_4.base.LivySparkJob.call(LivySparkJob.java:1)
at org.apache.livy.rsc.driver.BypassJob.call(BypassJob.java:40)
at org.apache.livy.rsc.driver.BypassJob.call(BypassJob.java:27)
at org.apache.livy.rsc.driver.JobWrapper.call(JobWrapper.java:64)
at org.apache.livy.rsc.driver.BypassJobWrapper.call(BypassJobWrapper.java:45)
at org.apache.livy.rsc.driver.BypassJobWrapper.call(BypassJobWrapper.java:27)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)”

I am having the input data with 8360 columns and 3841 categories. Do you have any suggestions?

Thank you,
Mihai

mlauber71 · July 17, 2020, 9:05pm

Unfortunately the H2O nodes with kinme seem to have problems with implementing and reproducing the string handlings they offer with Spark.

Try label encoding your categorical data if you expect the relationship to be a stable one. Or try another method to convert your strings into numbers (one hot encoding could be a challenge).

The workflow addresses other issues that I have encountered with Spark and H2O. There is also a presentation about it but it is in German though the charts are in English.

If you use R you might want to try great, problem is it would be challenging to bring that to a big data environment

mihais1 · July 24, 2020, 4:40pm

Hi @mlauber71,

Sorry for the delayed answer. I tried to use Spark Category To Number node in order to label the input data and the H2O Random Forest Learner node doesn’t accept numerical column at ‘Target Column’. My initial data has only one single column of type String, the other columns are numerical columns.

Thank you,
Mihai

mlauber71 · July 24, 2020, 8:41pm

You should try the one for regression targets

system · June 2, 2023, 8:59pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.