Normalized Class Distribution - Decision Tree Predictor Node

Hello,

I want to plot a ROC Curve comparing different learning algorithms which works well so far but I do not really understand how exactly the "P"-Value is calculated which is generated by the Decision Tree Predictor Node when enabling "Append Columns with Normalized Class Distribution".

Can anybody please tell me the formula or name a reference?

 

Help is much appreciated!

Hello mtest,

the appended probability values tell you the probability that a row is of the specified class. Let's consider a small example. Say you have two classes A and B then P(class=A) is the probability that the row is of class A and P(class=B) is the probablity of the row of being of class B.

In case of the decision tree model those probabilities are determined by the leaf in which the row falls. For each leaf the probabilies are estimated during the training as P(class=A) = |Rows of class A in leaf|/|total rows in leaf| (and for B likewise).

Hope that helps, if you need further explanation feel free to ask again ;)

Cheers,

nemad

This was really helpful, thanks a lot nemad!