Exporting learning model results from analysis nodes

JaeHwanChoi · December 6, 2023, 4:48am

Hello KNIME Support team.

In Python, you can export the trained model via PKL file during machine learning, but in KNIME, when using the Analyze node, you can export the model information to the blue or gray port.

However, some models can only be exported to PMML Writer and some models can only be exported to Model Writer. Is there a common node that can be exported?

Additionally, I want to export the model trained with ML node to PMML & Model Writer, and then import the file into my personal jupyter notebook to use it for prediction, can I use the model created in KNIME only in KNIME?

I would be grateful for your reply.

mlauber71 · December 6, 2023, 5:20am

@JaeHwanChoi pmml should be standard and be able to use in other systems. Though it seems to have some limitations

A lengthy discussion about the exchange of models between knime and python can be found here:

From my experience the best model format to interchange between knime, python, R and even Spark/BigData systems is H2O.ai MOJO:

KNIME has started to support Sklearn nodes. Though I have not tried to interchange their models with python (via pickle for example):

This seem to be the only change since we last had this discussion. The format of a model very much depends on the underlying system and packages. KNIME as a platform is trying to bring them together but I don’t think there is a universal solution or a format to cover everything. For deep learning there is ONNX but my experience here is limited.

Daniel_Weikert · December 6, 2023, 4:45pm

@mlauber71
Did you experience any specific benefits of using an ML Model in KNIME rather then using python directly? Just curious

mlauber71 · December 6, 2023, 5:57pm

@Daniel_Weikert well there is the low-code thing. KNIME nodes can help you with the configurations. Also KNIME can be your platform to integrate several aspects and approaches and then compare them.

JaeHwanChoi · December 7, 2023, 5:34am

Thank you for your response. @mlauber71 !!

Does this mean that the files exported by PMML Writer can be used standardly in other systems (R, Python), and the files exported by Model Writer cannot be used standardly because it is a KNIME-specific extension?

Also, some models can only be exported to PMML Writer and some models can be exported to Model Writer, but why is this distinction made?

Any answers would be greatly appreciated.

mlauber71 · December 7, 2023, 6:59am

@JaeHwanChoi the format and interoperability very much depends on the issuer of such model systems. PMML is supposed to be some sort of standard for a group of models though I found it might not always work on all platforms and does not cover some advanced models like XGBoost.

There is no such thing as a universal model format, especially not one that would work on all environments per operating systems. It will always depend on the environment used. KNIME is a platform to them and makes the use easier.

As I said. My best experience with interoperability is with H2O.ai MOJO format - you can use them seamlessly with KNIME, R and Python.

In this collection you will find several examples of interaction between knime and namely Python.

JaeHwanChoi · December 12, 2023, 1:36pm

Thank you for your response. @mlauber71

To send and receive models interoperably with R and Python, KNIME uses the h2o node to convert the trained model to “H2O Model to MOJO” and export it to “H2O MOJO Writer” and read the model using h2o-related code in Python?

I’m asking because I saw a related example WF, but it doesn’t have the process I want.

JaeHwanChoi · December 12, 2023, 1:59pm

Even though I exported the model from KNIME to mojo, I can’t seem to access h2o on my personal PC, a Jupyter notebook, unless I have h2o installed. I have installed the package, but do I need to have the actual paid or demo version of h2o to use the h2o model in Jupyter?

mlauber71 · December 12, 2023, 3:15pm

@JaeHwanChoi you can use the free version. You will have to have the h2o package installed and need to start a h2o process in the background.

Here is a notebook (I am planning to write a blog about it):

github.com

ml-score/knime_meets_python/blob/main/machine_learning/binary/notebooks/kn_example_python_h2o_automl.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e9ebb351",
   "metadata": {},
   "source": [
    "### Use Python H2O.ai AutoML to build a model and deploy it with KNIME MOJO nodes\n",
    "\n",
    "\n",
    "<img src=\"../KNIME_loves_h2o.png\" width=\"600\">\n",
    "\n",
    "\n",
    "##### GitHub repository\n",
    "https://github.com/ml-score/knime_meets_python/tree/main/machine_learning/binary\n",
    "\n",
    "##### KNIME workflow\n",
    "https://hub.knime.com/-/spaces/-/latest/~GABT_OgeoWxWJW9P/\n",
    "\n",
    "\n",

This file has been truncated. show original

You can import a mojo model from knime at any point. Or you can import a model you created in a jupyter notebook into knime. Or you just use the Python node to run model created in Python in knime and read the results.

If you want sample code in R there is an example in this workflow group

Daniel_Weikert · December 12, 2023, 5:39pm

@JaeHwanChoi
What exactly is the usecase for exporting models from KNIME? Why not making predictions in KNIME directly?
Just curious.

JaeHwanChoi · December 13, 2023, 1:17am

Thanks @mlauber71

If I need to start the h2o process in the background, does that mean I need to get and run a free demo of h20 so I can run h2o-related Python code in Jupyter on my local PC?

In other words, what I want to do is to turn the model generated by h2o automl in KNIME into a mojo, export it to mojo writer, and then import the exported mojo via h2o package in my personal Jupyter for forecasting.

The current problem is that I exported the model from KNIME, but when I install the h2o package in my personal Jupyter, I get a code error in the h2o.init() code because I don’t have a free version of the actual h2o available.

Please help.

JaeHwanChoi · December 13, 2023, 1:26am

A typical partner project wants to deploy KNIME-generated models to a specific storage, and then import those models into their own Python for forecasting, which means there is a constant demand for model compatibility between KNIME and Python.

To summarize based on the discussion so far, the only model that is compatible between KNIME and Python is h2o mojo, and models exported with “model writer” cannot use extensions that are only applicable in KNIME. In addition, the models that provide pmml are limited.

So, if you exported to h2o mojo, you need to subscribe to the actual h2o commercial version or demo version to use it in your personal jupyter to run h2o-related code.

Please confirm if the above summary is correct. @mlauber71!!

mlauber71 · December 13, 2023, 5:15am

I would suggest to install the h2o python package so you have access to it. This will be a complete working package not a demo.

conda install -c h2oai h2o

What kind of error message do you get? In the notebook there is an explanation how to use knime’s on board Java engine (which h2o will need) if you do not have a current version on your machine (see the end of this article: KNIME Snippets (2): Unearthing Hidden Node Gems — Managing Missing Values, Row Numbers and some Quick Java and Paths | by Markus Lauber | Low Code for Data Science | Medium).

The sample workflow above shows the interaction between knime and several python based modeling packages including XGBoost and Lightgbm. I assume setting this up and working with it will give you additional insights.

If you want to avoid the hassel of installing and managing python packages and dependencies you could just use KNIME. Although KNIME can also help with that task.

JaeHwanChoi · December 13, 2023, 6:05am

When you install and run Konda, you’ll see a message like the one below. It says I can’t access the server, but don’t I need a license of commercial h2o (like the demo version) to access the h2o server after all?

Installing Java, which is OpenJDK, doesn’t seem to solve it.

mlauber71 · December 13, 2023, 6:18am

@JaeHwanChoi I cannot read the whole error message. You will not need a license to use the free h2o version. You could try and use knime’s internal Java engine as said before.

Also you might plan ahead what you want to do in which system. Now you are trying to run h2o within a python node in knime which is also possible.

Using knime, python and h2o (with Java under the hood) will require to find the right setup.

Just to make sure: knime also does support the commercial product h2o driverless AI. This uses different model formats which are not compatible with the open version and indeed will need a license and also a powerful server.

https://docs.knime.com/latest/h2o_driverless_ai_guide/#introduction

JaeHwanChoi · December 13, 2023, 10:48am

Can you tell me what version of Java you’re using?

Daniel_Weikert · December 13, 2023, 5:13pm

“Deploy” as using the KNIME specific deployment nodes? Or “just” deploy as a regular workflow which uses those models on the hub?
A regular workflow could also use standard python nodes and then there would not be any compatibility issue I assume?

edit: Have tried h2o in python using colab before. It was free as @mlauber71 said.
br

mlauber71 · December 13, 2023, 8:07pm

With the bundled Python version one can use the sklearn package with the Knime Python nodes. My impression is the first task would be to make a plan what software and system should be used on which environment and also what skill levels will be needed by the people using it.

system · March 12, 2024, 8:07pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.