Random Forest Learner (Regression) getting Error java.lang.StackOverflowError

JPE · February 28, 2019, 10:06pm

My Random Forest regression learner is failing at 10% because of this
ERROR Random Forest Learner (Regression) 2:117 Execute failed: java.lang.StackOverflowError

I believe this is happening with one particular file, as another table is successful, even though they are derivatives of the same dataset and contain the same features (22) and similar number of rows (~60,000). I’m attempting to understand why this one file seems to always produce this error UNLESS I take it down to 3 or 4 features, but I can’t find anything on this error to understand why it’s happening on this file. Any ideas? Attached is a screenshot of the two basic workflows, which are not the neatest, but I’m curious what is causing this, even if it’s upstream related.
In this case the successful RFL (top) has almost 2x more rows than the unsuccessful RFL.

I have not seen this error until after I updated to 3.7 (today), but admittedly I hadn’t tried it on the problematic table before updating.

Note: The problematic file is a .tsv, and the successful file is a .csv. Could it really be that simple? I am converting to test and will update.
Update: a .csv of the file did not fix the error.

JPE · February 28, 2019, 10:20pm

Second update: Attempting from the base files also produces the same error pattern. I’m really looking for what the stack overflow error could mean and why my one file is causing it!

mlauber71 · February 28, 2019, 10:27pm

You could try to apply missing values and see how that goes, and also check if there are not strange NaN values, then the question is if it is with every 3-4 variables that the tree fails or can you nail down the column that is causing it. And you could check the domain calculator and see if it covers every value.

And which normalization function do you use?

JPE · February 28, 2019, 10:32pm

You were spot on thank you! I have accidentally called a table with missing values and all is fixed.

Any thoughts why would missing values cause this specific error? I don’t even see it from other people when I search for it.

s.roughley · February 28, 2019, 10:34pm

Stack overflow errors are thrown when “an application recurses too deeply.”

https://docs.oracle.com/javase/7/docs/api/java/lang/StackOverflowError.html

My guess is that something in the Random Forest algorithm gets caught in an infinite recursive loop with the missing values.

Steve

JPE · February 28, 2019, 10:36pm

Thank you, I feel like this makes sense and now I know what to look for if I receive this error again. So happy to have this solved, I was pulling my hair out for a few hours

ScottF · February 28, 2019, 10:37pm

Hi @JPE, can you mark @mlauber71 's post as the solution? This will help when people search for similar problems in the future.

JPE · February 28, 2019, 10:37pm

I use min-max 0 to 1 normalization

JPE · February 28, 2019, 10:38pm

Yes thank you for pointing out to do that, very helpful!

nemad · March 1, 2019, 7:50am

Hello @JPE,

would it be possible for you to share the workflow with me, so I can debug the code and find out what is happening.
If that is not an option, could you please post the entry in the KNIME log that relates to the error (it should contain a stacktrace).
Both normalization and missing values should not pose a problem to the RF Learner, as it is invariant to normalization and has built-in means to deal with missing values.

Thank you for your help and kind regards,

Adrian

JPE · March 1, 2019, 2:43pm

Yes, how would you like me to share this?

nemad · March 1, 2019, 3:01pm

If possible in this thread although I fear that the workflow might exceed the upload limit.
Could you otherwise send me a private message with a link to a dropbox/google drive folder containing the workflow?

system · March 8, 2019, 3:01pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.