Scorer node redundancy

Hello all,

I was just creating a workflow using a decision tree and I push the results into a scorer node to look up the accuracy to my model. I noticed that the accuracy statistics is kind of redundand when it comes to the distribution of correctly and wrongly predicted classes. I think it is confusing (especially for users who have worked with KXEN or Rapidminer) to display both TruePositives and FalsePositives AND TrueNegatives and FaldeNegatives in this table. Their both showing exactly the same results, just as a upside down compared to the other.

I do see that this is probably to show the direct connection of all the results for the f-measure, but maybe an additional view for a simple accuracy table would be nice. Something along the lines of this:

predicted\class Positive Negative precision
pred_Positive 47 6 88.68%
pred_Negative 3 44 93.62%
recall 94.00% 88.0% acc = 91%

I believe it would make the connection of the results in terms of understanding recall and precision a bit more clear.

 

Regards,

Robert

Hi Robert,

Thanks for the feedback.  The current implementation of the data in the output port for the scorer is designed to make it easy to use the data programatically in downstream nodes.  Also, for multi-class problems (n>2) these numbers will not be redundant.  

If you right-click on the node and open the view associated with it you will find a more nicely formatted view of both accuracy stats and the confusion matrix.  Does that help at all?

Aaron