xgboost regression hyperparameter optimization

BerkayAkar · May 9, 2021, 8:47am

hi dear form users

ı m new to knime platform. ı have a quaestions. ı want to predict by using xgboost

parameters={

        "learning_rate":[0.1,0.01,0.001],

        "n_estimators":range(100, 1200, 100),    

       }

xgb_model = xg.XGBRegressor(random_state=0)

this is my pyhton code and ı m searc by GridSearch to hyperparameter

. how can ı do this on knime ?

iperez · May 9, 2021, 3:09pm

Hi @BerkayAkar The parameter optimization loop allows you to find the best values of the parameters you choose to explore.

Hope this helps

BerkayAkar · May 9, 2021, 3:46pm

coldu you send me a work flow ? please beacuise ı have a lot of hyperparameter columb and this method using only one parameter

is it true ?

thanks for giving advice sir

iperez · May 9, 2021, 8:21pm

Sure. This might help you to get on the way.XGBoost Optimized.knwf (52.3 KB)

BerkayAkar · May 9, 2021, 10:09pm

thanks for ansver sir ı have one question ı want finish my learning rate value from 0.001 to 1

is it true configruation ?

and another questions is why we use table row to verable node ?

could you help me sir ?

thanks for giving advice

ex learning rate = [0.001,0.01,0.1,1]

only this verable

iperez · May 9, 2021, 10:32pm

HI. No the workflow is to get you started, I suggest that you check the parametrization you have in Python and check the parametrization in KNIME’s node, then make the adjustments you need.
The numeric socreer gives multiple numeric criteria that you may use to optmize you hyperpartameters, you have to be able to pass the parameter to the loop end, you’d need to tweak the last two nodes to get what you need. Hope I’m clear

BerkayAkar · May 9, 2021, 10:34pm

ı m sorry for my bad englsh so ı didnt understand

ı want to check this parametres

learning rate [0.1,0.001,1]
n-estiminators[100,200,300,400…1200]

it is posibble for this node ?

BerkayAkar · May 9, 2021, 10:38pm

hi thansk for advice but ı didnt understand ı m sorry for my bad english

ı want to check this parametres

learning rate [0.1,0.001,1]
n-estiminators[100,200,300,400…1200]

it is posibble for this node ?

iperez · May 9, 2021, 10:52pm

The learning rate in Knime is the eta parameter, I guess that you want tho check learning rates between 0.001 and 1 with a step increment of 0.1. What are the n-estimators?

BerkayAkar · May 10, 2021, 7:42am

xgboost n_estimators is between 100 adn 1200 verable and increment vaue is 100.

ex 100,200,300,400,500,600,700,800,900,1000,1100,1200,1300

iperez · May 10, 2021, 2:48pm

Hi, check this workflow: XGBoost Optimized.knwf (98.9 KB)

The optimization loop minimizes the root mean square edrror by trying different learning rates between 0.001 and 1 with a 0.1 step and varying the number of trees between 100 and 1200 with a 100 step.

Hope this will get you on your way to tweak this flow to your needs

BerkayAkar · May 10, 2021, 8:54pm

thanks for your ansver sir
have a nice day

kienerj · May 11, 2021, 12:53pm

@BerkayAkar

Please do not use the workflow “as-is”. Any type of parameter optimization like this will lead to a completely overfit model because you are optimizing for one single train/test split. . The model needs to be evaluated by cross-validation for example by taking the median of your target metric over all cross-validation loops. So inside the parameter optimization loop you need a CV-loop.

iperez · May 11, 2021, 1:30pm

Absolutely!!. It was meant as a starting point. Regarding @kienerj suggestion you can check the EXAMPLES Server 04_Analytics>11_Optimization>07_Cross_Validation_with_SVM_and_Parameter_Optimization

BerkayAkar · May 13, 2021, 11:24am

sir ı shuold use cross validation for partiton node is it true ?

BerkayAkar · May 13, 2021, 11:28am

and ı have a question sir

can ı use table row to verabe loop start ?

beacuise parameter optimization loop method is too long for best parameters so ı want to use this method and ı want to use cross validation in front of learner and predictor nodes is it posibble ? becauise i couldn’t it

is it posibble ?

and wht ı use table row to verable node ? why ?

BerkayAkar · May 13, 2021, 11:39am

how can ı use cross validation in front of the learning node ? ı can’t make it sir

kienerj · May 13, 2021, 1:03pm

As @iperez already answered beforehand, see the examples workflow:

And here the content of the workflow:

This of course needs to be further extend as all feature selections steps must also happen only on the training set.

Advanced topic:

I must also add that in case of parameter optimization I generate the cross-validation splits just once and apply them via reference row filtering and group start. The reason for that is that the X-Partitioner is very slow as in that it repeats the row splitting for every iteration which significantly impacts runtime.

BerkayAkar · May 13, 2021, 1:08pm

thanks for giving advice

ı have a problem

I am developing my project both at sklearn and knime

but my values in knime are much higher than my values in sklearn
What do you think is the reason ?

pretreatments and data are the same

square is 0.35

but sklearn giving 0.22

cv = 5 and k = 5

Daniel_Weikert · May 13, 2021, 1:22pm

You probably do not use the same seeds (random states). Also there might be a slight difference on how default parameters are set in scikit vs knime.
If a loop takes to long use random search and not grid search.
br