Decision Tree Error-Extraction

faebl · January 10, 2017, 8:50am

Hi to everybody!

I made a decision tree with the "Decision Tree Learner"-Node. The tree should split categoric data, that in the end a 100% allocation of every record to a leaf of the tree...
However, what I am getting is a tree that, in the leafs, got more than one type of records:
97% of, for the leaf correct, records and 3% Errors. (The errors are there for purpose.) Now I'd like to extract a table or sth similar where exactly these 3% (and of course other "errors" from other leafs) are listed.

I am relatively new to KNIME and also to decision trees, so please forgive me for eventually wrong naming conventions, etc. :)

Cheers, Fabian

faebl · January 11, 2017, 10:34am

Hi again!

After some messing around, I figured out a way to solve this issue:

The structure is like this:

Decision Tree Learner > JavaScript Desicion Tree View (with root data) > Select the Leafs in the JavaScript Tree View and hit Apply > Rule-based Row Filter (Selection Column = TRUE) > Rule-based Row Filter ("3% Error Property") > Interactive Table

You can add a Java Edit Variable to the second Rule-Baseed RowFilter to automate it a little.
I am still working on a solution for complete automation, but this works so far.

Cheers, Fabian ;)

nemad · January 11, 2017, 6:22pm

Hello Fabian,

if I get your question right, you want to know which rows (of your training data) were classified incorrectly.
See the attached workflow for an idea how you can get those rows.

If you want to have pure leaves in your decision tree, you have to disable pruning in the decision tree learner dialog.
However, doing so will cause your decision tree to overfit on the table you trained it on, which in general declines its accuracy on new data. Note that even if you disable pruning, it is still possible to get impure leafs if your data contains contradictory rows i.e. rows that are identical except for their class.

Cheers, nemad

extract_incorrectly_classified_rows.knwf

faebl · January 19, 2017, 11:09am

Thanks very much.
This was most helpful :)
Also thanks for the explanation...