Adding weight to gradient boosted trees (regression)

harry2021 · October 1, 2021, 12:57pm

Is it possible to add weights to gradient boosted trees and other similar ML models in Knime?

I’m trying to predict claims frequency with insurance data, and different instances in my dataset have been policyholders for a longer time than others (so their columns provide more information). Is there a way to factor this into models in Knime?

Thanks.

Daniel_Weikert · October 1, 2021, 3:54pm

Could you create an additional feature for this?
br

aworker · October 1, 2021, 5:17pm

Hi @harry2021 and welcome to the KNIME forum

Since most of the Machine Learning methods try to minimize a loss function, oversampling of samples based on their importance is a technique which can bias the training toward examples with more importance.

For instance, if you consider that an example is 4 times more important than all the others, you could oversample it 4 times in your training set. In this case, the error induced by this sample will have a weight 4 times higher than the others for the loss function.

Keep in mind that this should normally be done only in the training set and not in the Cross-Validation set, nor in the test set. The CV and Test sets should remain the same.

On top of replicating the important training samples in the Training set, you could add jittering to the variables (add slight noise to the descriptors) of these samples to avoid to replicate exactly the same samples (so that they are not exactly the same as the originals).

Some references about Jittering (or adding noise):

[1] Koistinen, Petri, and Lasse Holmstrom. "Kernel regression and backpropagation training with noise." Neural Networks, 1991. 1991 IEEE International Joint Conference on. IEEE, 1991.
[2] Holmstrom, Lasse, and Petri Koistinen. "Using additive noise in back-propagation training." Neural Networks, IEEE Transactions on 3.1 (1992): 24-38.
[3] Bishop, Chris M. "Training with noise is equivalent to Tikhonov regularization." Neural computation 7.1 (1995): 108-116.
[4] An, Guozhong. "The effects of adding noise during backpropagation training on a generalization performance." Neural Computation 8.3 (1996): 643-674.
[5] Vincent, Pascal, et al. "Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion." The Journal of Machine Learning Research 9999 (2010): 3371-3408.

In the particular case of Deep Learning, some libraries as TensorFlow and Keras (even in the KNIME implementation of the -Keras Network Learner- node) allow to implement your own loss function. In this case, the loss function equation could integrate the weighting of your more important samples. This is the place in the KNIME -Keras Network Learner- node where the loss function should be modified:

Hope these hints are of help.

Best

Ael

Daniel_Weikert · October 2, 2021, 2:16pm

@aworker are the knime keras / tf nodes now supporting newer versions? Last time I checked they did not.
@harry2021 If you are willing to “outsource” your ml model to a python snippet then you have various options to work with oversampling (like aworker mentionend) or class weights or …

kienerj · October 4, 2021, 7:57am

Simply said no. The xgboost node in knime has no option for instance weights. Oversampling in my opinion is then just a crutch for lack of full support of xgboost features in knime.

I suggest to simply “drop-down” to python and use xgboost from there with all the options including instance weights.

ipazin · October 4, 2021, 9:15am

Hello there,

there is ticket (internal reference: AP-11401) to support weights in XGBoost and will add +1 there. Upon news someone will update this topic.

Welcome to KNIME Community @harry2021!

Br,
Ivan

system · April 4, 2022, 9:16pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.