I don't know how to implement below solution in KNIME.
Let’s assume it looks like this:
Trip Nr | Timestamp | ID
1 13584512 1
2 13589986 2
3 13576544 2
4 13844555 3
Now you will run your classifier which might yield the following output:
Trip Nr | ID (estimated)
Comparing the two tables you can compute the accuracy of the classification. 1 out of 4 trips have been falsely classified. Resulting in an classification accuracy of ¾ or 75%.
Of course you have to use different data sets for training and verification. E.g. you could use trip 5-8 for training in this example and then trip 1-4 for validation…
Are you building a multi class classifier? The scorer node can probably do what you are after, I got a bit confused by your example though.
Compares two columns by their attribute value pairs and shows the confusion matrix, i.e. how many rows of which attribute and their classification match. Additionally, it is possible to hilight cells of this matrix to determine the underlying rows. The dialog allows you to select two columns for comparison; the values from the first selected column are represented in the confusion matrix's rows and the values from the second column by the confusion matrix's columns. The output of the node is the confusion matrix with the number of matches in each cell. Additionally, the second out-port reports a number of accuracy statistics such as True-Positives, False-Positives, True-Negatives, False-Negatives, Recall, Precision, Sensitivity, Specifity, F-measure, as well as the overall accuracy.
Thank you sam
No I am just estimating by myself. But if i your decision tree as classifier for above example then I need to check how much percentange prediction is correct means 2 out of 4 correct means 50% accuracy of predicted classifier or 3 out of 4 correct that means 25% accuracy of predicted classifier.
Did you get my point. If not then please let me know what you did not understood?
Can you upload a csv/spreadsheet with some example data? I'm pretty sure the scorer node will do what you want, did you see the file attatchments I provided?
In my example I hava an experimental result which can be a, b, c or d. I have a classification of a, b, c or d. I want to know how many of these are correctly predicted and get a % to represent how many are correct. I'm doing multi class classification, I've not used a model to do this, I've just represented my data as if I had.
In this example I don't want a breakdown of the correctness of predicting a, b, c or d seperately. I'm just interested in the overall accuracy.
Manually I would count how many times do the the prediction and experimental columns contain the same values. In my example this value is 9, therefore 4 of the rows do not have the same values. I have 9 correct predictions and 4 incorrect predictions. My accuracy is therefore (true count / total count) * 100 = 69.2.
If you look at the accuracy column for the scorer_results.png it shows 0.692 (the accuracy as a decimal representation).
Is this not doing what you want?