Classification tree

laurina56 · February 15, 2024, 9:22pm

Hi all. I am new of Knime but I have noted that running classification tree on iris data (selecting Gini index), I obtain different resuls from R rpart routine.
In R, the first entering variable is Petal.Length and the second one is Petal.Width, while in Knime only Petal.Width enters producing a lower performance.
Please, can anyone explain why that difference? Thanks in advance.

ScottF · February 23, 2024, 9:52pm

Hi @laurina56 and welcome to the forum.

It’s hard to say without seeing what you’re doing - maybe you have sample workflow you could upload? - but my first guess is that you’re training on different subsets of the data. Do you get better results if you hold the training dataset constant?

laurina56 · February 24, 2024, 1:17pm

Yes I have taken the dataset constant (I have not used training and test set). I have chosen Gini Index as the splitting criteria and I have looked only at the first variable entering the tree.
Well, the fact is that both splitting criteria: “Petal.Width<0.8” and “Petal.Length<2.5” perfectly identifies Species=‘Setosa’.
Then it seems that R and Knime chose different splits of the root node to get the same split.
By removing “Petal.Length” R and Knime give the same results with the same splitting values of the numeric predictors. The same, if I remove Petal.Width.
In conclusion, iris dataset is not appropriate for such a comparison.
Thank you for your reply.
Laurina

system · March 2, 2024, 1:17pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.