Gradient Boosting

X Boost node requires lot of Hyper parameters. From initial guess which a user inputs does the NODE perform automatic hyper parameter tuning.

Even if it does not perform automatic hyper parameter tuning it does not give any direction to go about ?

Prof S Chandrasekhar

Hi @Chandra_S and welcome to the forum!

The XGBoost Tree Ensemble Learner - and for that matter, other learner nodes in KNIME, regardless of the algorithm - don’t automatically tune hyperparameters. To do this, you need to use a Parameter Optimization Loop Start node and its associated Parameter Optimization Loop End to setup a loop around your algorithm of choice.

The start node in this case will allow you to select combinations of parameters you want to optimize, along with a choice of strategies (e.g. brute force, Bayesian, and others.). In the loop end you choose what metric you want to optimize (e.g. max accuracy, min error).

There is a video on our Youtube channel that explains this process in more detail:

Does that help?


Thanks a lot for your very prompt response.
Is there any example work flow so that I can edit and use.

Somewhat off topic but that video should be taken with a huge grain of salt. If you want to to do parameter optimization, you should do it with cross-validation not a single train-test split.

Sure, here’s a simple one (that also includes cross validation):

You can also search for ‘parameter optimization’ on the KNIME Hub and find several more that may be useful.

1 Like

Thanks a lot . It was really useful. But my data is time series . So for training I will select about 80% from the beginning and balance 20% towards the end will be for testing.

One cannot shuffle the data to build second tree as it disturbs the sequence.

Also rather than Forecast what I am interested is which variable is Important and some type of Sensitivity Analysis.

Simple CART tree the accuracy of Forecast is not good.

X boost is it a correct model for Multivariate Time series Analysis Apart from Multivariate ARIMA

Since you have time series data, you could use linear sampling instead of random sampling. Another option would be to bypass CV and just use the regular Partitioning node with the Take from top mode.

If your primary interest is in interpretability, you may want to take a look at SHAP or LIME. Here’s a workflow that demonstrates those. It’s set up to use a Random Forest, but you could switch it to XGBoost without too much trouble:


This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.