Overfitting control in Gradient Boost Regression Model

gokhan_sir · March 5, 2019, 8:39am

Hi,

I have a gradient boost regression model and i want to learn how can i overfitting contol for my model?

Best Regards,
gökhan

nemad · March 5, 2019, 10:14am

I assume you are using the Gradient Boosted Trees Learner (Regression) node?
This node essentially provides you with three parameters that can help to control overfitting:

Tree depth: The deeper a tree, the more it overfits the training data but a too shallow tree might not allow for enough feature interactions.
Number of models: This regards the number of trees to learn. Generally, more models mean more overfitting.
Learning rate: The learning rate defines the influence a single tree has on the overall prediction and a lower learning rate can be used to counter overfitting.

My recommendation is to keep the tree depth in the interval 3 to 5 and play with different configurations of learning rate and number of models. Note, however, that there are no general best settings (otherwise, they would be our defaults ) and you will have to try and see what works for your data.

Best,

Adrian

gokhan_sir · March 5, 2019, 1:35pm

Hi Adrian,

Thank you for reply.
Tree Depth : I am using 4 tree depth as you recommend
Number of models: Default settings is 100. Is it ok ? or can i use different number ? I used 100 models and our model R^2 is 0,913 - than used 1000 models and our R^2 is 0,93 and also have some root mean square error.
Learning Rate : i am also using defaults value (0,1).

When i changed learning rate between 0,05 to 0,5 - R^2 changed between 0,94 to 0,90. This mean our data is normal ?

I am the new gut for this machine learning algortihms . If i ask very simple and dummy question, i will be very sorry

Best Regards,
Gökhan

nemad · March 8, 2019, 7:47pm

Hello Gökhan,

I can’t tell you whether your data is normal as I am not sure how normal would be defined in this case.
However, a larger R^2 means that your model explains the variation in the data better, so a smaller learning rate seems to benefit your model perhaps because it might allow the algorithm to make finer distinctions.

Kind regards,

Adrian