I am using decision tree learner and predictor nodes. I notice that same splitting attribute can appear more than once in the tree, and the attribute is categorical variable.
e.g. Tier: A, B and C
Branch 1: Tier A
Branch 2: Tier B & C
On subsequent branches of branch 2, I see splitting by Tier A, B, and C again.
In the attached example, tier Gold appears on the subsequent level (bottom-most). How is it possible?
I have been searching for a few hours, but have not managed to find the solution in the forum. Any clarifiction would be greatly appreciated.
this is the option "Binary nominal Splits". if you deactivate it, you get one child per nominal value.
PS: from the node description:
- Binary nominal splits
- If checked, nominal attributes are split in a binary fashion. Binary splits are more difficult to calculate but result also in more accurate trees. The nominal values are divided in two subsets (one for each child). If unchecked, for each nominal value one child is created
Thank you for the reply. However, I still don't understand why the Gold appears in the lower branch, while it already appeared on higher branch, when I check the option "Binary nominal Splits". Is it supposed to be "Green" only, on the lower right leaf, instead of "Gold, Green"?
The said option causes the Decision Tree to divide all possible values into two subsets, independent from the number of occurences of each value. You are right in your conclusion that the bottom right branch could also be "Tier isIn [Green]", because Gold has been excluded already. However, if I understand it correctly, the tree divides the possible values, not only the ones actually appearing.
Hope this clarifies it.
Thanks for the explanation. Got it, the values being considered independently explains why it reappears on the deeper branch. Thanks again.