xgboost regression hyperparameter optimization

hi dear form users

ı m new to knime platform. ı have a quaestions. ı want to predict by using xgboost

parameters={

        "learning_rate":[0.1,0.01,0.001],

        "n_estimators":range(100, 1200, 100),    

       }

xgb_model = xg.XGBRegressor(random_state=0)

this is my pyhton code and ı m searc by GridSearch to hyperparameter

. how can ı do this on knime ?

Hi @BerkayAkar The parameter optimization loop allows you to find the best values of the parameters you choose to explore.

Hope this helps

1 Like

coldu you send me a work flow ? please beacuise ı have a lot of hyperparameter columb and this method using only one parameter

is it true ?

thanks for giving advice sir

Sure. This might help you to get on the way.XGBoost Optimized.knwf (52.3 KB)

thanks for ansver sir ı have one question ı want finish my learning rate value from 0.001 to 1

image

is it true configruation ?

and another questions is why we use table row to verable node ?

could you help me sir ?

thanks for giving advice

ex learning rate = [0.001,0.01,0.1,1]

only this verable

HI. No the workflow is to get you started, I suggest that you check the parametrization you have in Python and check the parametrization in KNIME’s node, then make the adjustments you need.
The numeric socreer gives multiple numeric criteria that you may use to optmize you hyperpartameters, you have to be able to pass the parameter to the loop end, you’d need to tweak the last two nodes to get what you need. Hope I’m clear

ı m sorry for my bad englsh so ı didnt understand

ı want to check this parametres

learning rate [0.1,0.001,1]
n-estiminators[100,200,300,400…1200]

it is posibble for this node ?

hi thansk for advice but ı didnt understand ı m sorry for my bad english

ı want to check this parametres

learning rate [0.1,0.001,1]
n-estiminators[100,200,300,400…1200]

it is posibble for this node ?

The learning rate in Knime is the eta parameter, I guess that you want tho check learning rates between 0.001 and 1 with a step increment of 0.1. What are the n-estimators?

xgboost n_estimators is between 100 adn 1200 verable and increment vaue is 100.

ex 100,200,300,400,500,600,700,800,900,1000,1100,1200,1300

Hi, check this workflow: XGBoost Optimized.knwf (98.9 KB)

The optimization loop minimizes the root mean square edrror by trying different learning rates between 0.001 and 1 with a 0.1 step and varying the number of trees between 100 and 1200 with a 100 step.

Hope this will get you on your way to tweak this flow to your needs

thanks for your ansver sir
have a nice day :slight_smile:

@BerkayAkar

Please do not use the workflow “as-is”. Any type of parameter optimization like this will lead to a completely overfit model because you are optimizing for one single train/test split. . The model needs to be evaluated by cross-validation for example by taking the median of your target metric over all cross-validation loops. So inside the parameter optimization loop you need a CV-loop.

2 Likes

Absolutely!!. It was meant as a starting point. Regarding @kienerj suggestion you can check the EXAMPLES Server 04_Analytics>11_Optimization>07_Cross_Validation_with_SVM_and_Parameter_Optimization

sir ı shuold use cross validation for partiton node is it true ?

and ı have a question sir

can ı use table row to verabe loop start ?

beacuise parameter optimization loop method is too long for best parameters so ı want to use this method and ı want to use cross validation in front of learner and predictor nodes is it posibble ? becauise i couldn’t it

image

image

is it posibble ?

and wht ı use table row to verable node ? why ?

how can ı use cross validation in front of the learning node ? ı can’t make it sir

As @iperez already answered beforehand, see the examples workflow:

image

And here the content of the workflow:

This of course needs to be further extend as all feature selections steps must also happen only on the training set.

Advanced topic:

I must also add that in case of parameter optimization I generate the cross-validation splits just once and apply them via reference row filtering and group start. The reason for that is that the X-Partitioner is very slow as in that it repeats the row splitting for every iteration which significantly impacts runtime.

thanks for giving advice

ı have a problem

I am developing my project both at sklearn and knime

but my values in knime are much higher than my values in sklearn
What do you think is the reason ?

pretreatments and data are the same

image square is 0.35

but sklearn giving 0.22

cv = 5 and k = 5

You probably do not use the same seeds (random states). Also there might be a slight difference on how default parameters are set in scikit vs knime.
If a loop takes to long use random search and not grid search.
br