ROC Based Feature Ranking/Selection

Hello everyone,

  1. I want to ask that is there any node available for knime that ranks the features based on ROC area under the curve?

  2. I am looking to code one for research purpose for knime if there isn’t any available yet… So need your suggestions on whether it can be done… how?

  3. I haven’t coded any node earlier than this one so need your guess as well on how much time will it take for a beginner to code it.

PS: I have good programming skills in languages such as C# and VB.NET

Thanks All!

1) Probably others know a node like this, I do not.

2) A good start might be the code for the ROC Curve node (I guess for research purposes the GNU GPL is good), but probably it is easiest to start from scratch.

3) If you know the algorithm you want to implement and it does not include many configuration, complex views, it can be really fast. With all the ceramony I think it can be done in less than a day. (If you want to use graphical UI to select the features from the ROC curve it can be much longer though. Java do not make it easy to test and design complex UI. JRebel might help to shorten the experimenting.)

If you are familiar with C# you might miss some features from Java, although I guess mostly it is ok. An alternative might be using Scala, I have a project that can generate template for it, although it is not too user-friendly yet. If you want to do regular KNIME node development, you might consider the KNIME Developer Training soon (February 17-18).

It depends on what exactly you want to do. If you want to measure the predictive power of a single column based on AUC then the attched workflow will probably help you. It trains a model on a single column and computes the resulting AUC. The loop iterates over all columns except the class column and collects all AUCs together with the columns names in the output table.

Thanks for your help!