In the Decision Tree Learner node, there is an option for Reduced Error Pruning (REP). As I understand it, REP is a post-pruning technique which evaluates the change in misclassification error by systematically creating sub-trees. The literature on REP indicates that a separate data set (sometimes called a pruning set) is used to evaluate misclassification error for every subtree. I have not been able to find how the KNIME Decision Tree Learner creates the pruning set or sets. Any help would be appreciated.
Really good question. I wondered at some point as well
The Decision Tree Learner node uses the training dataset as pruning dataset for the reduced error pruning option.
Thanks for the response. When I tried Reduced Error Pruning (REP) in KNIME’s Decision Tree Predictor, it did not change the number of final nodes or accuracy. It seemed to have no effect. The final number of nodes was 145 with and without REP.
I re-ran the same data using Weka’s REPTree which does reduced error pruning. This worked fine, reducing the number of nodes from 145 to 15 while achieving slightly better accuracy.
I am working on a textbook using KNIME and I think my recommendation will be to not use reduced error pruning with the KNIME Decision Tree Predictor. Is there something I am missing?
thanks for your feedback! I will forward the feedback to our developers.
Have you tried to use the minimum description length (MDL) pruning option instead of REP?
Yes, I have used the MDL pruning and it works very well.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.