Confusion Matrix?

Docminus · November 14, 2014, 12:39pm

Anyone know what to use for making a confusion matrix with existing data? I think I saw something in connection with some of the machine learning nodes, but then it comes from the models. I simply want to compare measured data with elsewhere predicted data (class model).

http://en.wikipedia.org/wiki/Confusion_matrix

Not an expert with statistics and math, usually know what I need but not always how to get there (manually in excel works but sucks) - sorry if ask something obvious.

richards99 · November 14, 2014, 1:08pm

Hi,

You simply need a measured column and a predicted column in which the data is categorical/class data (i.e. Active or Inactive), and not numerical/regression data (i.e. 34 or 1351). If your data is in a numerical/regression format then bin the data first with one of the binning columns.

Thats all you need, so if you have two classes of active and inactive, you will then have a confusion matrix where it shows you predicted active and predicted inactive vs the measured active and measured inactive.

Values which fall in to this p.inactive vs m.inactive and p.active vs m.active are correctly predicted datapoints, values falling into the p.inactive vs m.active are false negatives, whilst those values falling into the p.active vs m.inactive are false positives. By comparing correctly predicted to incorrectly predicted you will get a percentage accuracy. For a 2 class prediction like I just described, total random guessing will give 50% accuracy, so of course you will be hoping for significantly higher than that.

Simon.

Docminus · November 14, 2014, 2:53pm

Thanks. For a 2-class model that seems pretty straightforward, even for me and doable in excel.

But for a 3-class model it doesn't seem like that, therefore I wondered if a Knime node or workflow could do the work.

richards99 · November 14, 2014, 2:55pm

It should still work with a 3-class model too. To give a 3x3 cell matrix. i.e. with classes of Very Active, Active, Inactive. Again, it can be setup into 3 bins with the Binner nodes.

This time, random will be 33% (3 out of 9 cells), so you would be aiming to be much better than 33%.

Simon.

Aaron_Hart · November 17, 2014, 11:13am

Just to chime in, there is automatic confusion matrix calculation available in the Scorer node.

https://www.knime.org/files/nodedetails/_mining_modeleval_Scorer.html

Docminus · November 17, 2014, 2:05pm

Ah, thanks Aaron, that's exactly what I was looking for.

I was closing in via a binning method, but this is much simpler (in terms of result, not in terms of learning something....)

richards99 · November 17, 2014, 8:39pm

Whoops, I assumed you were already talking about Scorer node and wanted to know how to use and interpret it.

Thanks for stepping in Aaron!

Simon.

Aaron_Hart · November 18, 2014, 2:46pm

It was a good answer Simon, thanks to you for also chipping in.

Cheers,

Aaron

chhotu_kumar · December 12, 2014, 9:31am

sir, i am using auto binner node to remove column spec compatibility, but another problem in confusion matrix is that here accuracy is zero percent. what should i do?

chhotu_kumar · December 12, 2014, 9:51am

here is the print of the problem

problem.png

chhotu_kumar · December 12, 2014, 9:55am

here is the print of the problem with viewing confusion matrix.

problem2.png

chhotu_kumar · December 12, 2014, 10:39am

here new problem with MLP, here is the print of the problem in attachment and error is - number of input neurons must be greater than zero.

problem_3.png