Hi to everybody!
I made a decision tree with the "Decision Tree Learner"-Node. The tree should split categoric data, that in the end a 100% allocation of every record to a leaf of the tree...
However, what I am getting is a tree that, in the leafs, got more than one type of records:
97% of, for the leaf correct, records and 3% Errors. (The errors are there for purpose.) Now I'd like to extract a table or sth similar where exactly these 3% (and of course other "errors" from other leafs) are listed.
I am relatively new to KNIME and also to decision trees, so please forgive me for eventually wrong naming conventions, etc. :)
After some messing around, I figured out a way to solve this issue:
The structure is like this:
You can add a Java Edit Variable to the second Rule-Baseed RowFilter to automate it a little.
I am still working on a solution for complete automation, but this works so far.
Cheers, Fabian ;)
if I get your question right, you want to know which rows (of your training data) were classified incorrectly.
See the attached workflow for an idea how you can get those rows.
If you want to have pure leaves in your decision tree, you have to disable pruning in the decision tree learner dialog.
However, doing so will cause your decision tree to overfit on the table you trained it on, which in general declines its accuracy on new data. Note that even if you disable pruning, it is still possible to get impure leafs if your data contains contradictory rows i.e. rows that are identical except for their class.
Thanks very much.
This was most helpful :)
Also thanks for the explanation...