Question about Random Forest

Lawson · April 9, 2018, 3:46pm

In Ensemble Tree node, I would like to use it for variable selection, may I know the difference between the following options:

Attribute Selection

Use same set of attributes for each tree describes that the attributes are sampled once for each tree and this sample is then used to construct the tree.
Use different set of attributes for each tree node samples a different set of candidate attributes in each of the tree nodes from which the optimal one is chosen to perform the split.

What is the implication of these two options? Thanks

nemad · April 9, 2018, 4:12pm

Hello Lawson,

maybe a little example can illustrate the difference.
Let’s assume we want to decide if the weather is suitable to play tennis and we have the three variables temperature, sunny, windy.

The first option draws a sample from those (e.g. [temperature, windy]) and uses this sample to create a decision tree ignoring all other variables.

The second option draws such a sample for each split inside of an individual tree, so the first split may be calculated using temperature and windy, while the second split may be calculated using sunny and windy.
This technique is used in random forests to increase the diversity of the individual trees.

Cheers,

nemad

Lawson · April 10, 2018, 12:31am

Thanks for clarification.

However, I am not quite clear about their prediction power, i.e. best variables selected, of these two methods, any rules for picking option 1 or 2. Or I need to try both to see which one got better results?

Many thanks

nemad · April 10, 2018, 2:17pm

Option 2 usually gives the best results in terms of prediction power (e.g. accuracy) because it results in more diverse trees. In a way this is the secret ingredient that makes random forests work so well.

Lawson · April 10, 2018, 2:33pm

Many thanks for your explanation.

For Option 2, any drawback? More resources required?

Is there any advantages of using Option 1? Good for small sample size?

Thanks in advance.

nemad · April 15, 2018, 6:43pm

No, there are no drawbacks or at least no recognizable drawbacks.
Not that I am aware of.

Lawson · April 16, 2018, 1:02am

Many thanks