Variable selection in Decision tree learner

sunkyung · September 7, 2016, 8:23am

'Variable selection in Decision tree learner'

Hi :)

my name is sunkyung.

my woriing at finance as a junior consultant in Korea.

What I am curious about is 'Variable selection in Decision tree learner'.

What is a method of change of variables when separating a node.

I don’t know that I choose a variables to desire.

Sorry my English is not very good, thank u for reading!

marco_ghislanzoni · September 7, 2016, 9:26am

Hi Sunkyung,

I understand from your post that you are seeking more information on how Decision Trees are built, in particular how the variable to split each node is chosen. I would suggest you to read through the following references as a starting point:

Decision Trees (Classification): http://www.saedsayad.com/decision_tree.htm

Decision Trees (Regression): http://www.saedsayad.com/decision_tree_reg.htm

They are both introductory articles on the subject. There are obviously many other ways of building decision trees, but ID3 mentioned in the article is often regarded as the foundation.

KNIME's base implementation of a Decision Tree (Classification) is in the Decision Tree Learner node and is based on the C4.5 algorithm, the successor of ID3 by the same author. You can read about it here: https://en.wikipedia.org/wiki/C4.5_algorithm

Note: there is some terminology mismatch here. Both Classification and Regression Trees are type of Decision Trees, the first for categorical variables, the second for numerical or continuos variables, but in KNIME and in other sources a Classification Tree is called a Decision Tree.

KNIME's base implementation of a Decision Tree (Regression) is in the Simple Regression Tree Learner node and is based on the CART algorithm. You can read about it here: http://machinelearningmastery.com/classification-and-regression-trees-for-machine-learning/

Hope this helps.

Cheers,
Marco.