Saving XGBoost Model in PMML

ChetanP · January 15, 2020, 6:37am

I have done the XGBoost modes now i want to save that in PMML format, but its not allowing me to do so. Please let me know how i can do the same or is there any other way to get this done.

mlauber71 · January 15, 2020, 6:40am

There is only a limited number of models that can be converted to PMML. XGBoost is not among them as of now.

ChetanP · January 15, 2020, 7:13am

Hi,
@mlauber71
Is there any way, i can get this rules like Decision Tree for XGBoost so that i can put that in production or how can i put this into production?

mlauber71 · January 15, 2020, 7:25am

Yes you can by saving the model with the model writer producing a proprietary zip file and using that.

XGBoost with H2O.ai is not in KNINE yet? The you could put that to Spark also.

I would have to check if the MOJO predictor could use XGBoost models produced by h2o automl.

AlexanderFillbrunn · January 15, 2020, 7:32am

Hi,
if you use the Gradient Boosted Trees Learner, you may also be able to utilize the Gradient Boosted Trees to PMML node to generate PMML.
Kind regards
Alexander

ChetanP · January 15, 2020, 9:47am

Hi @AlexanderFillbrunn,

Thank you very much.

Currently i am facing issue while reading the hyper parameters in flow variable tab (Not sure the meaning of those), hence struggling to input their values in Gradient Boosting.

Would you please let me know some document/s that lists out all hyper parameters and their meaning for Gradient Boosting as per Knime?

AlexanderFillbrunn · January 15, 2020, 10:06am

Hi @ChetanP,
unfortunately there is no such document. Which hyperparameters do you want to control specifically? I could find out for you which flow variable mappings to set.
Kind regards
Alexander

ChetanP · January 15, 2020, 11:46am

Hi,

That would be great.

I am sharing the excel sheet. Please mention Knime analogous name in column “c”.

Thanks in advance!!GBM_Parameters - Knime.xlsx (12.3 KB)

AlexanderFillbrunn · January 15, 2020, 12:39pm

Hi,
here are the names for flow variable configuration. Not all options are applicable for our node. I have indicated that in the table.
Kind regards
Alexander

GBM_Parameters - Knime.xlsx (12.4 KB)

ChetanP · January 15, 2020, 12:56pm

Hi,

Would you please check and let me know please?

ChetanP · January 15, 2020, 1:15pm

Hi @AlexanderFillbrunn,

Thank you very much. I would appreciate your help if you could do the same for XGBoost (i have done it though just to be sure).

I am attaching excel sheet for XGBoost hyper parameters, i have attached Knime Analogous name for all hyper parameters (in column c)XGBoost Parameters - Knime.xlsx (17.8 KB)
,just to be sure, please cross check once

AlexanderFillbrunn · January 15, 2020, 1:28pm

Hi,
we don’t have a specific XGBoost node. For that you need to use the Python Snippet node.
Kind regards
Alexander

ChetanP · January 16, 2020, 9:03am

@AlexanderFillbrunn,

I am trying to tune the hyper parameters of Gradient Boosting. Would you please let me know what is significance of michildsize? Is this to control number of records that enters in the terminal node? Because i want to tune that as well

AlexanderFillbrunn · January 16, 2020, 9:28am

Hi,
even though this flow variable does not correspond to a setting in the dialog, it seems to work. So minChildSize is, as you correctly assumed, the number of nodes in a terminal node.
Kind regards
Alexander

mlauber71 · January 16, 2020, 6:10pm

I checked and at the moment it is not possible to load H2O.ai MOJO files generated via their Python (or R) packages into KNIME and reuse them. Although it is possible to use a Python node to create scores from them. But that may not help if you want to distribute them via KNIME to a server or Big Data cluster like with the other MOJO models.

ChetanP · January 17, 2020, 5:21am

@AlexanderFillbrunn,

Thank you very much

ChetanP · January 17, 2020, 9:20am

@AlexanderFillbrunn , @mlauber71

I am trying to tune the parameters using Bayesian Optimization but not sure importance/tune the parameters in it. Please let me know how can i use (one line explanation will help me here)
Random seed, Enable step size, Max. number of iterations, number of warm-up rounds, Gamma, Number of candidates per round.
Capture1

AlexanderFillbrunn · January 17, 2020, 2:07pm

Hi,
Have you checked the node description? There should be an explanation for every option available.
Kind regards
Alexander

ChetanP · January 18, 2020, 3:24am

Hello,

No i am not sure. Would you please let me know how can i check the node description & where is it available?

AlexanderFillbrunn · January 18, 2020, 7:21am

Hi,
You finde the node description either directly in the Analytics Platform (View -> Node Description) or on KNIME Hub:

Kind regards
Alexander