Reduced error pruning

acito · January 21, 2022, 8:26pm

Hello,
In the Decision Tree Learner node, there is an option for Reduced Error Pruning (REP). As I understand it, REP is a post-pruning technique which evaluates the change in misclassification error by systematically creating sub-trees. The literature on REP indicates that a separate data set (sometimes called a pruning set) is used to evaluate misclassification error for every subtree. I have not been able to find how the KNIME Decision Tree Learner creates the pruning set or sets. Any help would be appreciated.
Thanks.
Frank

Kathrin · January 24, 2022, 12:44pm

Hi @acito,

Really good question. I wondered at some point as well

The Decision Tree Learner node uses the training dataset as pruning dataset for the reduced error pruning option.

Cheers
Kathrin

acito · January 24, 2022, 2:52pm

Hi,

Thanks for the response. When I tried Reduced Error Pruning (REP) in KNIME’s Decision Tree Predictor, it did not change the number of final nodes or accuracy. It seemed to have no effect. The final number of nodes was 145 with and without REP.

I re-ran the same data using Weka’s REPTree which does reduced error pruning. This worked fine, reducing the number of nodes from 145 to 15 while achieving slightly better accuracy.

I am working on a textbook using KNIME and I think my recommendation will be to not use reduced error pruning with the KNIME Decision Tree Predictor. Is there something I am missing?

Frank

Kathrin · January 26, 2022, 10:11am

Hi Frank,

thanks for your feedback! I will forward the feedback to our developers.

Have you tried to use the minimum description length (MDL) pruning option instead of REP?

Cheers
Kathrin

acito · January 30, 2022, 7:26pm

Hi Kathrin,
Yes, I have used the MDL pruning and it works very well.
Frank

system · April 30, 2022, 7:26pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.