Feature suggestionR-square, C-stat, and Pval for regressions

eecoxjr · February 15, 2008, 10:11pm

Knime is a great tool! However, for regressions it would be very desirable to include more than just the coefficient estimate in the output. Std Error, z-value, P values and Significance flags (ex ***) would be extremely valuable. For linear models adding R-square and for logistic models adding c-statistic (area under ROC) would also be very nice. (Actually that last one could be applied to any classifier.) If any of these are already there and I have missed them just let me know.

I figure the above should be relatively easy to implement. A bit harder to implement but also very valuable would be if you could add step regression with options for forward, reverse or both.

unknown_user · February 18, 2008, 10:53am

eecoxjr wrote:

... for logistic models adding c-statistic (area under ROC) would also be very nice. (Actually that last one could be applied to any classifier.)

Regarding ROC, we have just written a node that plots ROC curves and computes the area under the curve. However, it works only for two-class problems (multi-class ROC is still an open issue) and needs a classifier that outputs class probabilities. We will try to add class probabilities to all our classifiers where this is possible until the next KNIME release.

Thorsten

unknown_user · April 18, 2008, 5:28pm

Multi-class ROC and class probabilities for all relvant classifers will both be great enhancements, especially the probabilities.

Fowards, backwards and stepwise facilities for all models would be very useful as well. Could this not be built something like the other meta learners? Such as instead of applyin the learner to a new CV split you would be apply the learner to a new variable set?

unknown_user · April 19, 2008, 2:23pm

At the risk of outing me as dummy: we did look into multi class ROC and could neither come up with a good way to do this nor find anything in the literature. Can you point us to something useful? Class probabilites will start showing up for more and more classifiers.

Backwards feature elminiation is already running over in the development version of KNIME. This is done via a special set of looping nodes, which allow to selectively re-execute parts of a pipeline (between the begin- and end-nodes). Other types of stepwise "do something" can either be done easily using this concept or we'll provide additional nodes anyway...