ArrayIndexOutOfBoundsException using Tree Ensemble Learner

janh · January 18, 2013, 12:17pm

Hi,

a problem occured while using theTree Ensemble Learner.

I'm using KNIME 2.7.0. The Learner gets an input table of ~6,1k rows and 100 cols. As settings I use the ones described in the link above for RandomForest with number of models set to 1k.

I get the following warning and error:

WARN RearrangeColumnsTable$ConcurrentNewColCalculator Unhandled exception in processFinished.

ERROR Tree Ensemble Learner Execute failed: java.util.concurrent.ExecutionException: java-lang.ArrayIndexOutOfBoundsException: -1

I tried to change the settings a bit and to filter some columns from the input table, but that didn't help.

Do you have any ideas how to fix this?

gabriel · January 18, 2013, 1:14pm

Thanks for reporting this problem which is very likely a bug in node configuration. To locate the problem we need to have the full stack trace available in KNIME Console view. Please switch on DEBUG logging under File > Preferences > KNIME GUI. Thanks.

janh · January 18, 2013, 4:25pm

See attached logfile.

knimelog.txt

wiswedel · January 24, 2013, 1:47pm

I'm not able to reproduce it and the log file doesn't reveal the actual problem. Do you think you can send us the data and workflow?

Thanks, Bernd

janh · January 25, 2013, 10:21am

Sorry, but I can't. Those are customer data.

But I'll keep an eye on this problem and post further information, if this error occurs in other situations.

Anyway, thanks for your help. :)

ymiladi · July 11, 2013, 5:57pm

I had the same problem with my Tree Ensemble Learner....Oddly enough all what I did is removing some #N/A strings that I had when I exported my data from Excel.

wiswedel · February 2, 2014, 10:45am

Hi,

Just came across this post. If ymiladi or anyone else runs into the problem it would be extremely useful if you can attach an example flow or data. I'm still not able to reproduce :-( and N/A should be a big deal as this node doesn't accept missing values anyway (will abort with a reasonable error message).

Can you clarify? Thanks!

vravirala · October 19, 2016, 4:42pm

ERROR Gradient Boosted Trees Predictor 2:30 Execute failed: ("ArrayIndexOutOfBoundsException"): -1

nemad · October 20, 2016, 10:30am

Can You provide an example workflow or at least an instruction how to reproduce the problem?

Cheers,

nemad

vravirala · October 30, 2016, 3:10am

Sorry, the Gradient Boosted Trees Predictor error is inconsistent, and most likely has to do with the size of input and/or parameters. When input to predictor included is minimally necessary vars (i.e., those used for training), and after I uncheck Append individual class probabilities, the problem disappeared. Adding large input vars back or checking probs seem to cause the problem. Sorry, I cannot share my data. The workflow is simple:

Create GBT and save using Model Writer. Read again using Model Reader, and provide an input data set. I trained using only numerical (double) variables and removed all categorical independent variables. Target column of course was a String categorial variable. The input data file read is actually the same for both training and predicting. I random sample based on Target (kMeans Clusters), and predict again for another random sample. Essentially testing to see if GBT can predict the Clusters based on other variables.

Hope that helps a bit. If I find another situation where I could share more, I shall post again. Thanks!

vravirala · November 1, 2016, 8:35pm

I'm consistently able to produce the GBT predictor error. The steps are Read data, GBT Learner, Model Writer, and in a seperate workflow Read Data, Model Reader, GBT Predictor with Append individual class probabilities checked, and the error occurs. Uncheck works. Thanks.

nemad · January 17, 2017, 10:40am

Hello Vravirala,

first I would like to apologize for the delay.
I am now working on this bug and have trouble reproducing it in my KNIME AP.

It would be great if you could tell me which version of the KNIME AP you are running and if the problem also appears on the latest version (3.3.1).

Thank you for your help in improving KNIME

Cheers,

nemad

Ramon_Ankersmit · April 4, 2017, 11:15pm

Dear Nemad,

I'm having this bug now all the time in the latest KNIME version. Did you solve this issue or do you still have trouble reproducing it?

Cheers,

Ramon

Ramon_Ankersmit · April 4, 2017, 11:36pm

Below a part of the errors i see in the log:

-----

DEBUG Gradient Boosted Trees Predictor 0:116      reset
DEBUG String Manipulation 0:131      String Manipulation 0:131 doBeforePostExecution
ERROR Gradient Boosted Trees Predictor 0:116      Execute failed: ("ArrayIndexOutOfBoundsException"): -1
DEBUG String Manipulation 0:131      String Manipulation 0:131 has new state: POSTEXECUTE
DEBUG Gradient Boosted Trees Predictor 0:116      Execute failed: ("ArrayIndexOutOfBoundsException"): -1
java.lang.ArrayIndexOutOfBoundsException: -1
   at org.knime.base.node.mine.treeensemble2.model.MultiClassGradientBoostedTreesModel.getClassLabel(MultiClassGradientBoostedTreesModel.java:139)
   at org.knime.base.node.mine.treeensemble2.node.gradientboosting.predictor.classification.LKGradientBoostingPredictorCellFactory.getCells(LKGradientBoostingPredictorCellFactory.java:165)
   at org.knime.core.data.container.RearrangeColumnsTable.calcNewCellsForRow(RearrangeColumnsTable.java:503)
   at org.knime.core.data.container.RearrangeColumnsTable$ConcurrentNewColCalculator.compute(RearrangeColumnsTable.java:732)
   at org.knime.core.data.container.RearrangeColumnsTable$ConcurrentNewColCalculator.compute(RearrangeColumnsTable.java:1)
   at org.knime.core.util.MultiThreadWorker$ComputationTask$1.call(MultiThreadWorker.java:442)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:328)
   at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:204)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
   at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
DEBUG String Manipulation 0:131      String Manipulation 0:131 doAfterExecute - success
DEBUG String Manipulation 0:131      String Manipulation 0:131 has new state: EXECUTED
DEBUG String Manipulation 0:131      Column Filter 0:133 has new state: CONFIGURED_QUEUED
DEBUG Gradient Boosted Trees Predictor 0:116      Gradient Boosted Trees Predictor 0:116 doBeforePostExecution
DEBUG Gradient Boosted Trees Predictor 0:116      Gradient Boosted Trees Predictor 0:116 has new state: POSTEXECUTE

Ramon_Ankersmit · April 4, 2017, 11:42pm

Some debug logging i get:

-----

DEBUG Gradient Boosted Trees Predictor 0:116 reset DEBUG String Manipulation 0:131 String Manipulation 0:131 doBeforePostExecution ERROR Gradient Boosted Trees Predictor 0:116 Execute failed: ("ArrayIndexOutOfBoundsException"): -1 DEBUG String Manipulation 0:131 String Manipulation 0:131 has new state: POSTEXECUTE DEBUG Gradient Boosted Trees Predictor 0:116 Execute failed: ("ArrayIndexOutOfBoundsException"): -1 java.lang.ArrayIndexOutOfBoundsException: -1 at org.knime.base.node.mine.treeensemble2.model.MultiClassGradientBoostedTreesModel.

getClassLabel(MultiClassGradientBoostedTreesModel.java:139) at org.knime.base.node.mine.treeensemble2.node.gradientboosting.predictor.classification.

LKGradientBoostingPredictorCellFactory.getCells(LKGradientBoostingPredictorCellFactory.java:165) at org.knime.core.data.container.RearrangeColumnsTable.calcNewCellsForRow(RearrangeColumnsTable.java:503) at org.knime.core.data.container.RearrangeColumnsTable$ConcurrentNewColCalculator.compute(

RearrangeColumnsTable.java:732) at org.knime.core.data.container.RearrangeColumnsTable$ConcurrentNewColCalculator.compute(

RearrangeColumnsTable.java:1) at org.knime.core.util.MultiThreadWorker$ComputationTask$1.call(MultiThreadWorker.java:442) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:328) at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:204) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123) at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246) DEBUG String Manipulation 0:131 String Manipulation 0:131 doAfterExecute - success DEBUG String Manipulation 0:131 String Manipulation 0:131 has new state: EXECUTED DEBUG String Manipulation 0:131 Column Filter 0:133 has new state: CONFIGURED_QUEUED DEBUG Gradient Boosted Trees Predictor 0:116 Gradient Boosted Trees Predictor 0:116 doBeforePostExecution DEBUG Gradient Boosted Trees Predictor 0:116 Gradient Boosted Trees Predictor 0:116 has new state: POSTEXECUTE

nemad · April 6, 2017, 11:21am

Hello Ramon,

thanks for posting.

Do you also write the model out and read it in a different workflow?
This is important to know because your log indicates that the problem is a different one (based on the line where the exception is thrown).

I have an intuition what might be the problem but reproducing it is kind of tricky (you would have to provoke a numerical overflow). Can you provide an example workflow where this problem exists? That would be immensly helpful to confirm my suspicion.

Thanks,

nemad

nemad · April 11, 2017, 1:13pm

Hi Ramon,

your post seems to be cut off..

But from what you wrote I get the feeling that there is no minimal example. But can you give some more information on your dataset? Number of features, what kind of features (numerical/nominal) and the number of classes/categories. With this information I might be able to reproduce the problem in my setup.

Thanks,

nemad

Ramon_Ankersmit · April 18, 2017, 9:44am

Hi Nemad,

The data i use isn't that complicated; i made some minimal data setup that contatins 5 columns:
1st column is a start date in unix time as Double example 1480055077889
2nd column is a webpage identifier as String example start.homePage
3-5 columns are Integer columns range 0 - 600

I use around 10k samples for training the GBT. Training the learner with the 2nd column - webpage - is ok, but the GBT predictor fails (immediate) with the indexOutOfBounce exception.

Some new info: seems to do something with 1k samples and fails with 2k samples of training data.

Regards,

Ramon

.

Ramon_Ankersmit · April 18, 2017, 11:25pm

Hi Nemad,

I maybe have found a clue with some good old trial and error. It seems that for me the string used for training (in my case the webpage) the GBT makes a difference. Using a maximum of 53 characters it seems all goes well... using 54 characters the predictor crashes. Maybe this helps you reproducing the problem?

Regards,

Ramon

nemad · April 23, 2017, 3:32am

Hi Ramon,

to be honest your last posts make me really scratch my head.
I literally can't imagine what I messed up that the prediction depends on the length of the class names.

However, maybe there is a possible explanation. Do you limit the length of the class names by converting the names, i.e. keeping all rows but limiting the class name to 53 characters? Or do you filter out all rows which have a class that exceeds the 53 characters?

In the first case I would suggest to enumerate all classes and use the number of the class as new target for learning and predicting the model. Afterwards you can just replace the number with the actual class name again.

In the second case it would be really interesting how many different classes you have in total. A really high number of classes would also explain, why the GBT scales so badly in your use case because I never experienced such problems with datasets that are considerably larger and contain more features.

To be honest I hope that the second option is the case because this issue literally deprives me of my sleep =D

Thanks for your help in figuring out this problem,

nemad