Hi there, I am trying to plot an ROC Curve in a Decision Tree with Multiple Classes (in this case was four ). I did some calculations in order to get it done. One was to append a column to say if the model predicted the the correct class and say "Yes" and "No" depending of the output.
Secondly I added a P(p=yes) column that selected the maximum probability of the predicted classes.
After those calculations I plotted the ROC curve, but I do not know if it is correct.
I beg if someone can check that the ROC plotted is correct.
Haven't looked at your workflow, but ROC curves generally make sense for binary classification only (how would you vary a classification threshold for a four-class problem?). What might make sense is to produce a single ROC curve for each of your four classes, evaluating whether each input is classified as class X, or not class X (one vs. rest).
This article is often cited when it comes to multiclass ROC curves:
David J. Hand and Robert J. Till (2001). A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning 45(2), p. 171–186.
The described methodology has been implemented in the R package pROC.You may want to take a look at it.
Your implementation seems to be a sort of one vs. the rest, but it is unclear to me how you are using the scores from the original multi-class prediction to build the derived binary prediction your ROC curve is based on.
Other techniques I have seen used in this kind of applications are micro- and macro-averaging of ROC curves for each class vs. the rest, but their validity is controversial (e.g. class skew sensitivity).
I would proceed as qqilihq suggested above. Simpler and cleaner too.
Thank you Marco I will check the documentation properly. On the other hand in a quick and dirty response what I did was the following...... I asked the predictor node to append the probabilities for each class and then I added a column selecting the MAX value of such probabilities, assuming that the MAX probability was the predicted class and I used it as a P(p=yes) column and then plotted it in the ROC curve.
Thanks again as this matter is currently important for me.
Yes, I understand what you did, but I cannot directly relate it back to the meaning of a ROC curve, which basically is a plot of False Positive Rate vs. True Positive Rate at different threshold levels.
What are the FPR and TPR for your derived binary classifier based on the original multi-class problem? And what are the meaning of the original scores once assigned via the MAX function to the derived classifier?
I am trying to understand the rationale behind your reasoning. You obtained a curve for your derived classifier which looks a lot like a ROC curve, but is it really a ROC curve? (for example, what is the meaning of its AUC?) And which relation does it have to the original multi-class problem?
Just a criteria to select a decision tree (besides the confusion matrix), in case that I have another. Probably I have to fill some knowledge gaps before trying things..
You might alternatively want to look into log likelihood, which, like the ROC, helps with the evaluation of the probabilities. The decision tree with the value closest to zero on this indicator would be considered superior.