I have the decision tree but its getting over fitted. I have tried using Parameter loop start and End but this is not helping at all. Is there any way to deal with this?
Take a look at this workflow from the KNIME hub. It shows how to do the parameter optimization. Maybe you can use this as an example for your workflow.
I have shared the workflow screenshot above. @HansS i would like optimize parameters in decision tree. I have tried optimizing number of records. Can you help me out optimizing other parameters if any?
Hi,
What’s the accuracy you achieve with your current model? What is it on the test and training set. Can you produce a line plot of accuracy VS different values of min leaf size? If we don’t know the data there is little we can say here, but of course we understand that confidentiality is of utmost importance. Have you enabled error pruning? The decision tree learner does not offer any other options that help with that, but min leaf size should be sufficient.
Kind regards
Alexander
I am getting 83% accuracy on the training and on testing it is 82%. Accuracy seems fine.
i am facing problem in specificity and sensitivity it has both very low. Also, if i look at Cohen’s Kappa on training its 0.121 and on testing its 0.048 clearly the model is over fitting.
You mentioned producing a line plot of accuracy Vs different values of leaf size, how can i do this? Also can i do it for Cohen’s Kappa as well. Please share the workflow
Hi,
I don’t have a workflow for the line plot, but it should be easy to do. You already have the parameter optimization loop, so you can simply collect the hyperparameter value and the Kappa at the loop end node, then attach a line plot. Your data seems very heavily imbalanced, is that true? Maybe some over- or undersampling helps? Your Cohen’s Kappa is low even for the training set. Have you tried other classifiers?
Kind regards
Alexander
Sharing the screenshot of parameter loop end, but when i select that, i am unable to execute the parameter loop end. Yes, you are right i have high imbalance data. Here, number of records is the parameter that i have set.
I would like to try Random forest but i have no idea as to how to run and how to evaluate it
Hi,
that is the wrong configuration. The maximize option just determines whether you maximize or minimize. Giving it the value of the Cohen’s kappa flow variable won’t work. You actually don’t have to use the “Flow Variables” tab at all. In the “Options” tab, select “Cohen’s kappa” from the ComboBox at the top and then just check “maximize” at the bottom.
Kind regards
Alexander
Okay. I got that. Again, if i select that then i wont be able to tune the parameter “Number of records”. How can i do this? When i select the maximum Cohen’s Kappa and run the decision tree again its over fitting. In many terminal nodes i am getting 1,2,3,… such records. Clearly its over fitting. Is there any way i can look at both Cohen’s Kappa and minimum number of records?
Hi,
have you configured your Decision Tree Learner to actually use the value of the flow variable as setting for the minimum number of nodes per leaf? There you have to do that in the “Flow Variables” tab.
Kind regards
Alexander
@AlexanderFillbrunn Yes, i have. When i select Maximum Cohen’s Kappa that (minimum number of records) becomes redundant. Number of records i have set using parameter optimization loop start
Why does it become redundant? In the loop you adjust “Number of Records” in every iteration, measure Cohen’s Kappa and in the end you choose the setting that maximized it. But from the numbers you had earlier, it seems like no amount of hyperparameter optimization will help you here. How does your data look like? How many data points, features and classes do you have? How skewed is the class distribution exactly? What is the problem with using a random forest? You just have to insert the Random Forest Learner node instead of the decision tree learner.
Kind regards
Alexander