Random Forest

Good morning,

I am using RF for classification and I have some doubts:

  • How do I use the Loop nodes to optimize the hyperparameter “number of variables used at each split” ?
  • How do I obtain the variable importance table/graph? I came across this link (How to get the variable importance from the Random Forest model?), but it’s from 2016, uses the Tree Ensemble Learner instead of the Random Forst and it is a bit confusing to me.
  • How do I get 95% confidence intervals for the final AUC?

Example RF.knwf (49.1 KB)

Thank you,
Marc

Hi Marc

concerning the optimization loop nodes you might check this workflow: https://hub.knime.com/knime/spaces/Examples/latest/04_Analytics/11_Optimization/07_Cross_Validation_with_SVM_and_Parameter_Optimization . It uses SVM, but gives a nice overview about how to combine crossvalidation and parameter optimization.

The output of Random Forest and Tree Ensemble are identical, as the RF nodes are just an easier version of the TE nodes. Considering that you can use the solution in the forum link that you posted (How to get the variable importance from the Random Forest model? I also updated the link to the slides there).

Hope that helps!

Best
Alice

2 Likes

Thank you very much, @Alice_Krebs. In the case of high-dimensional data (p>n), do you know of an example that incorporates Feature Selection in the cross-validation process?

Thank you very much,
Marc

Hi @MarcB

The example workflow you provided earlier didn’t include the data and I couldn’t run it, so it’s always a bit challenging to give concrete advice, sorry. There are some examples that might help you:

Maybe that helps you proceeding.
Best
Alice

1 Like

Thank you, @Alice_Krebs!

Hi @Alice_Krebs (and anyone!),

I attach a workflow to predict the class A from a binary classification using Random Forests. Letting aside the poor performance, which may be due to errors in the data, I do not understand why the AUC is so poor (<0.15!) and yet the Scorer shows good metrics from the confusion matrix (accuracy 0.80, etc.). Is there anything incorrectly specified from the X-aggregator onwards?

Z_Example RF2.knwf (47.4 KB)

Thank you,
Marc

Since there isn’t data included in the workflow, it’s a little tricky to diagnose this. But I used one of your previously posted small datasets to investigate.

In short, I think the problem is that you’re not plotting the ROC curve correctly. First you need to check the option to append individual class probabilities in your predictor node. Then, make sure you’re setting the appropriate class column (not prediction) and are plotting the probabilities (and not the confidence, or the predictors themselves). Something like this:

I’m not sure what the results will look like since I don’t have your data. But hopefully much more reasonable.

2 Likes

Thank you very much, @ScottF. I was not checking the “Append individual class probabilities” and now results look much more reasonable and are in agreement between the ROC curve and Scorer nodes.

Best regards,
Marc

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.