Bootstrapping for a regression model

Hi,
I managed to improve the R2 for my regression model by introducing the node for the bootstrapping sampling with default settings. R2 was 0.47 and now with this node, it is 0.79.
When I used my model to validate on external data, R2 dropped again to 0.4.
How can I use Bootstrap properly on my unbalanced data?
Bootstrap seems to me so far like a cheating method where samples are duplicated and then duplicates can fall in the test set which is obvious to improve the model.

Thanks

Hi @zizoo,
Bootstrapping is mainly used for ensemble models to increase the variance between the different models. Just applying bootstrapping to your data before splitting it into test and training and then training a single model on it is not the correct use case for it, as you have noticed when using the model on external data. Why do you want to use bootstrapping here in the first place?
Kind regards,
Alexander

Hi @AlexanderFillbrunn,
I was reading further about bootstrapping and I realised that I don’t need as I am using cross-validation to avoid overfitting.
Do you think we can use the bootstrapping sampling to replace the partitioning node?

Thanks,

Hi,
As I said, bootstrap sampling mostly makes sense in ensemble model learning. Replacing the partitioning node will not help in any way, I think.
Kind regards,
Alexander

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.