Random forest performs worse than single decision tree

Hi everyone,

For some reason, when performing a parameter optimization loop for both a random forest and a single decision tree, the best result for the random forest is significantly worse than the one for the decision tree (AUC=0.506 and 0.789 respectively). Is it possible that there are such few predictive variables in my dataset, that a random forest randomly selects unpredictive variables most of the time?

Kind regards,

Hi,
if you say the random forest is worse, do you mean it has a worse AUC when predicting the validation set? Which parameters are you optimizing for both algorithms? Maybe you are just overfitting?
Kind regards
Alexander

Hi @AlexanderFillbrunn,

For both algorithms I’ve optimised the number of trees and min node size (random forest) and min node size (decision trees) on a validation set. The AUC that I’ve mentioned is on another test set, using the optimal parameters for both techniques.

Kind regards,

Hi,
and what AUCs do you see in the validation set? Is the decision tree also better there?
Kind regards
Alexander

Hi @AlexanderFillbrunn,

In the validation set I observe the same phenomenon: AUC of random forest around 0.50 and decision tree around 0.78-0.79.

Kind regards,

This sounds pretty strange. One guess could be that you have some restrictions in the Random forests or some overfitting in the Decision Tree. You should check if you do really use the same data files in both cases.

Maybe you could try and see if you could benchmark your case with this AutoML workflow:

You could force H2O to just consider Tree algorithms and see what the result is.

1 Like

I’ve checked the workflow again and I indeed made a specification error, my bad! Thanks a lot for responding to the topic :slight_smile:

Kind regards,

3 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.