Determining the most important variables for Random Forest

Hi! I’m working on an assignment that tells me to interpret the variables and name the most important variables when using a trained random forest model. I’ve been trying to search for a good way to do this but so far I’m not finding any good explanations. In fact, I seem to be reading some mixed things about how to do this in Knime. Asking my professor didn’t actually help either as he told me to just search Knime Hub, or to just manually look through the tree view for the first tree in the model (which doesn’t make sense to me with a random forest?). I feel like I should use the attribute statistics but the professor disagreed when he saw the candidate columns? Am I supposed to just sum up the values for each split level per variable? Does anyone know of any good sources that helps me figure this out?

I’m just following a basic workflow like this:

Maybe look into using a feature selection loop to work out which combination of features delivers the best result. That’s a simple way.

There’s also a component you may want to check out (including some example workflows)

This I think also helps find out importance by feature

3 Likes

I’ve been trying to make the Global Feature Importance component work with my workflow, but have not been able to figure it out. Do you happen to have any tips?

@OerjanVS you can take a look at this entry

Here is an example you could try to adapt.

You might want to read how these mechanisms work. If this is part of the assignment.

3 Likes

I think @mlauber71 got you covered.

Maybe some general piece of advise:

Whenever you come across a node or component that you do not know how to use, it is worthwhile clickling on the “related workflows” link on the hub page of the node / component:

There you see a list of all the different workflows that are publicly available on hub that use that node / component. Those with the KNIME Logo were developed by the KNIME team - a lot of times as examples or educational materials:

3 Likes