My Random Forest regression learner is failing at 10% because of this
ERROR Random Forest Learner (Regression) 2:117 Execute failed: java.lang.StackOverflowError
I believe this is happening with one particular file, as another table is successful, even though they are derivatives of the same dataset and contain the same features (22) and similar number of rows (~60,000). I’m attempting to understand why this one file seems to always produce this error UNLESS I take it down to 3 or 4 features, but I can’t find anything on this error to understand why it’s happening on this file. Any ideas? Attached is a screenshot of the two basic workflows, which are not the neatest, but I’m curious what is causing this, even if it’s upstream related.
In this case the successful RFL (top) has almost 2x more rows than the unsuccessful RFL.
I have not seen this error until after I updated to 3.7 (today), but admittedly I hadn’t tried it on the problematic table before updating.
Note: The problematic file is a .tsv, and the successful file is a .csv. Could it really be that simple? I am converting to test and will update.
Update: a .csv of the file did not fix the error.
Second update: Attempting from the base files also produces the same error pattern. I’m really looking for what the stack overflow error could mean and why my one file is causing it!
You could try to apply missing values and see how that goes, and also check if there are not strange NaN values, then the question is if it is with every 3-4 variables that the tree fails or can you nail down the column that is causing it. And you could check the domain calculator and see if it covers every value.
And which normalization function do you use?
You were spot on thank you! I have accidentally called a table with missing values and all is fixed.
Any thoughts why would missing values cause this specific error? I don’t even see it from other people when I search for it.
Stack overflow errors are thrown when “an application recurses too deeply.”
My guess is that something in the Random Forest algorithm gets caught in an infinite recursive loop with the missing values.
Thank you, I feel like this makes sense and now I know what to look for if I receive this error again. So happy to have this solved, I was pulling my hair out for a few hours
Hi @JPE, can you mark @mlauber71 's post as the solution? This will help when people search for similar problems in the future.
I use min-max 0 to 1 normalization
Yes thank you for pointing out to do that, very helpful!
would it be possible for you to share the workflow with me, so I can debug the code and find out what is happening.
If that is not an option, could you please post the entry in the KNIME log that relates to the error (it should contain a stacktrace).
Both normalization and missing values should not pose a problem to the RF Learner, as it is invariant to normalization and has built-in means to deal with missing values.
Thank you for your help and kind regards,
Yes, how would you like me to share this?
If possible in this thread although I fear that the workflow might exceed the upload limit.
Could you otherwise send me a private message with a link to a dropbox/google drive folder containing the workflow?
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.