Documentation of quality measures in Decision Tree Learner

In the Decision Tree Learner node, in the description of the “quality measure” option, you can read:

Gain ratio: The Gini ratio is simply another name for the Gini index, expressing the same inequality measure as a ratio derived from the area between the Lorenz curve and the line of perfect equality.”

My understanding is that Gain ratio is based entirely on entropy: it normalizes information gain, which itself is computed from entropy, using a second entropy-based term (split information). It is therefore not related to the Gini index, which is a different impurity measure used in other decision tree methods.

Thanks,

Luis

1 Like

Here’s Gemini’s answer:

The Gain Ratio is not the same as the Gini Index. While both are used to select splits in decision tree models (like ID3, C4.5, and CART), they are distinct measures with different calculations and purposes

Gain Ratio: An improvement on Information Gain (based on entropy), this metric aims to reduce bias towards features with many distinct values by normalizing the gain by “intrinsic information”.

Gini Index: Measures the impurity of a data partition based on the probability distribution of classes. It is generally faster to compute and often results in binary splits, typically used in CART algorithms.

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.