Understanding Tree Ensemble Learner Node

Hi everyone!

I've tried to understand the View:Tree Views when I've used the Tree Ensemble Learner Node, but I can't. Let me explain:

When I use the Decision Tree Learner node, I can see perfectly how this tree classifies my samples. I can see the class that it's assigned (using decision tree for classification, by the way) and percentages (picture 1 attached). But when I use Tree Ensemble Learner node (with the same dataset), the view it's like picture 2 : just in each branch it's written the class, the amount of samples in that class (1/1)¿?¿ and give to the class assigned the 100%.


Could anyone help me to understand what's happened?


Thank u! have a nice day! :)

In tree ensemble, there is not one single tree, there are actually several different trees, each one grown from a subset of features and/or examples. That's why you have multiple trees in the tree views. The idea is that each tree will contribute its class vote in a majority voting.

The drawback of this method is that it is a sort of a black box. Therefore, tree ensembles are probably better suited for applications where performance matters more than intelligibility. Nonetheless, you can draw some interesting information about variable selection from a tree ensemble by analyzing which features are most often used to split the root in each tree (see Attribute statistics).

OK, thank you for reply me Geo, but I don't understand that,  when I see the tree views, there are all my trees, and each one with the situation of the picture 2:  each node has its predicction class, 1/1 and decides the class with 100% (and some nodes below might change the class). I understand that it changes the class if the previous node above the  current node it has not choosen a class with a 100% of confidence, but..if it was written 100% this is a terminal node, isn't it? why still split the tree?

Thank you to make the effort to help me with this :)

Could you provide more details about how you've configured the Tree Ensemble ?

I show you the Ensemble configuration (picture 3). I've used Information Gain Ratio as a split criterion too

Ok, I have just tested this on some data on my own so to understand your question. Sorry for me answering in the completely wrong direction. I appear to have been mislead by the title.

And I agree: the Tree View seems weird with its constant indication of 100% with n = 1. It only appears to reproduce the structure of each tree and not the class frequencies.

Ok thank you Geo, excuse me because I didn't explain correctly what my question its about and well, at least I see that I'm not the only one that don't understand the behavoir of these Tree Ensemble View XD.

I've tried to found any example when someone has used Tree Ensemble node but I haven't so, if someone could help us, it would be great!


Thank you for your time Geo, have a nice day! :)

Hello enribueno, hello Geo,

Please check the checkbox "Save target distribution in tree nodes" in the "Attribute Selection"-tab of the Tree Ensemble Learner dialog.

As this is memory expensive it is deactivated by default and causes the empty distributions in the Tree View. This behaviour is not straightforward and will probably change with the summer release.


1 Like

Yes! you're right Ferry! 

thank you so much!! 

Have a nice day! :)

Tipptopp, thanks Ferry :-)