# Random Forest - Random Forest column split and candidate counts

Hi,

In a trained Random Forest, the Attribute Statistics include #splits (level 0), #splits (level 1), #splits (level 2), #candidates (level 0), #candidates (level 1), #candidates (level 2).

It is easy to understand the #splits (level 0) and #candidates (level 0)

Does anybody know how the following numbers are determined/calculated?
#splits (level 1) and #candidates (level 1)
#splits (level 2) and #candidates (level 2)

Thank you!

Hi,
Conceptually those numbers are calculated in the same way like those for the root split, just for the second and third splits in the tree.
The candidate number indicates how often the attribute was in the attribute sample used to find the split and the number of splits is the number of times the attribute won the split.
Note that the numbers roughly double with each level because the second level contains two splits and the third level eight.

2 Likes

A follow-up of the same question. With the Random Forest Learner node, we can derive variable importance by using the # of times an attribute “was a candidate” vs how many times it “won the split”, but we can’t see where the split is.

For example, if a continuous variable “price” is, at level 0, the most important attribute, how can I know at which price the split is happening (if at all)? I know Random Forests are ensembles of trees, so even if price “wins” 10/10 splits, odds are it “wins” the splits at different price points; are these points of split also averaged?

Best,
Joel

Hello @JoelMenendez,

I don’t understand how the split point would affect the variable importance but if you want to see the split point, you can go to the node’s view where the splits are displayed.
You can also extract the individual decision trees as PMML which is a special kind of XML and further process it to get this kind of information.

Best,