How to get the variable importance from the Random Forest algorithm? Being able to calculate the variable importance is the merit of the Random Forest, but it seems in KNIME this critical function is missing.
the KNIME Tree Ensemble node offers a second outport which gives you details about the variable importance. Here you can see howoften a variable was used for building a decision three at the first second or third level. As a measure for variable importance, divide the splits with its candidate and sum the three.
You can also take a look at the white paper Seven Techniques for Data Dimensionality Reduction (https://www.knime.org/white-papers) where this technique is explained as well.
Regarding this issue I am in the same situation. I mean, I tried to extract the most valuable variables from node "Random Forest Learner" but seems there are not trivial way to do so. I tried to use, as suggested by Iris, the node "Tree ensemble Model Extract", the node "Tree ensemble statistics" but none of these guys show any list of variable by means of its weight on the model.
I would appreciate any help,
I believe Iris explained it properly above. The importance of the variables can be devised directly out of the Tree Ensemble Learner node. There is no need to use other nodes like the one you mention.
The attached workflow should help clarifying how this can be done.
I guess you are referring to the variable importance measure suggested by the authors of the Random Forest algorithm and you are right the Random Forest implementation in the KNIME AP does currently not support this feature.
But one of the beautiful sides of the KNIME AP is that you can quite easily build a workflow that does the same. At this year's KNIME Summit Dean Abbott gave a great talk about how to do exactly that using a KNIME workflow (
The randomization he speaks of is very similar to what the authors of the Random Forest use in their variable importance measure.