Anyone know what to use for making a confusion matrix with existing data? I think I saw something in connection with some of the machine learning nodes, but then it comes from the models. I simply want to compare measured data with elsewhere predicted data (class model).

http://en.wikipedia.org/wiki/Confusion_matrix

Not an expert with statistics and math, usually know what I need but not always how to get there (manually in excel works but sucks) - sorry if ask something obvious.

Hi,

You simply need a measured column and a predicted column in which the data is categorical/class data (i.e. Active or Inactive), and not numerical/regression data (i.e. 34 or 1351). If your data is in a numerical/regression format then bin the data first with one of the binning columns.

Thats all you need, so if you have two classes of active and inactive, you will then have a confusion matrix where it shows you predicted active and predicted inactive vs the measured active and measured inactive.

Values which fall in to this p.inactive vs m.inactive and p.active vs m.active are correctly predicted datapoints, values falling into the p.inactive vs m.active are false negatives, whilst those values falling into the p.active vs m.inactive are false positives. By comparing correctly predicted to incorrectly predicted you will get a percentage accuracy. For a 2 class prediction like I just described, total random guessing will give 50% accuracy, so of course you will be hoping for significantly higher than that.

Simon.

Thanks. For a 2-class model that seems pretty straightforward, even for me and doable in excel.

But for a 3-class model it doesn't seem like that, therefore I wondered if a Knime node or workflow could do the work.

It should still work with a 3-class model too. To give a 3x3 cell matrix. i.e. with classes of Very Active, Active, Inactive. Again, it can be setup into 3 bins with the Binner nodes.

This time, random will be 33% (3 out of 9 cells), so you would be aiming to be much better than 33%.

Simon.

Just to chime in, there is automatic confusion matrix calculation available in the Scorer node.

https://www.knime.org/files/nodedetails/_mining_modeleval_Scorer.html

Ah, thanks Aaron, that's exactly what I was looking for.

I was closing in via a binning method, but this is much simpler (in terms of result, not in terms of learning something....)

Whoops, I assumed you were already talking about Scorer node and wanted to know how to use and interpret it.

Thanks for stepping in Aaron!

Simon.

It was a good answer Simon, thanks to you for also chipping in.

Cheers,

Aaron

sir, i am using auto binner node to remove column spec compatibility, but another problem in confusion matrix is that here accuracy is zero percent. what should i do?

here is the print of the problem

here is the print of the problem with viewing confusion matrix.

here new problem with MLP, here is the print of the problem in attachment and error is - number of input neurons must be greater than zero.