BUG H2O MOJO Predictor

iiiaaa · July 20, 2018, 1:18pm

Dear all,
there is a bug in Knime H2O MOJO Predictor (in both v 3.5.3 and v 3.6 ). As you can see in the workflow attached “H2O Predictor (Regression)” is able to do prediction on Column names or Column content that contains accents.
But “H2O MOJO Predictor (Regression)” is not able to do prediction on Column names or Column content that contains accents and provide me the following error:

ERROR H2O MOJO Predictor (Regression) 0:52 Execute failed: PredictException in row ‘Row0’: Unknown categorical level (species,vèrsicolor)

Please, could you help me?
Thanks
Regards

H2O Pojo to Mojo_v2.knwf (98.4 KB)

Working · July 20, 2018, 2:22pm

Dear all,

I would be really interested in following the resolution of this bug since I cannot proceed in my work because of that.
Thanks for your prompt feedback!
Kind regards

mlauber71 · July 21, 2018, 9:49am

as far as I can see the error message means that the model found a categorical level it has not seen before. This might be due to the fact that in your sample workflow with the partitioning you just took values from the top so in the 30% test cases this level might no be included. As far as I remember the working of the H2O models you could either let it fail or it would apply a missing value (which I think makes sense).

In the workflow you try to predict the petal_width by providing the species.

H2O Pojo to Mojo_v3.knwf (122.0 KB)

iiiaaa · July 23, 2018, 8:19am

Dear mlauber,
many thanks for your reply. But I think that the error message is misleading and it is not a problem if the model find a categorical level it has not seen before.
It is a problem of accents.

In fact if you disconnect the Rule Engine node that insert accents (connecting directly the “Table Reader” to the “Table to H2O”) everything works well.
Please find attached the workflow.

So I believe it is a BUG.

Knime, could you confirm?

Thanks
Regards

H2O Pojo to Mojo_v4.knwf (31.3 KB)

christian.birkhold · July 23, 2018, 8:42am

Hi @iiiaaa,

I guess actually it is a problem if a categorical level is detected which was not present in the training data as H2O automatically encodes categorical data into numerical data and saves the encoding model alongside with the prediction model. If a categorical value was not present during training, the encoding can’t be done. Can you confirm that all categorical levels are present during training which are presented during testing?

Cheers,

Christian

iiiaaa · July 23, 2018, 10:05am

Dear Christian,
many thanks for your reply. If you open the workflow “H2O Pojo to Mojo_v4.knwf” you will see that H2O is able to do prediction on a category not seen before (at least it does not produce any error).
The error is produced if you connect Rule Engine node that inserts accents.
So it seems it is a matter of accents and not a matter of new category.
Could you confirm it is a bug?
Thanks
Regards

mlauber71 · July 23, 2018, 6:22pm

I cannot reproduce the error. With my installation it works with or without accents. Could you check which OS and version you have (mine is below). As far as I know there is no special unicode setting in KNIME it should just work like it is.

You could check your local H2O context. After upgrading to 3.6 I had some problems and hat to restart KNIME once with an option “-clean” at the very top of the knime.ini

H2O Pojo to Mojo_v4.1.knwf (132.6 KB)

org.osgi.framework.os.name=MacOSX
org.osgi.framework.os.version=10.13.6
org.osgi.framework.processor=x86-64

KNIME H2O Machine Learning Integration 3.6.0.v201807060629 org.knime.features.ext.h2o.feature.group KNIME AG, Zurich, Switzerland

KNIME H2O Machine Learning Integration - MOJO Extension 3.6.0.v201805131030 org.knime.features.ext.h2o.mojo.feature.group KNIME AG, Zurich, Switzerland

iiiaaa · July 24, 2018, 7:44am

Dear mlauber71,
thanks for your answer, much appreciated.

I tried to change in Preferences the H2O Local context version from 3.10.5.2 to 3.20.0.2. But I still have the same error:

ERROR H2O MOJO Predictor (Regression) 0:52 Execute failed: PredictException in row ‘Row0’: Unknown categorical level (species,vèrsicolor)

As you can see in the print screen below without Rule Engine that insert accents it works well

But with the Rule Engine that insert accents it does not work

I have the same error in both Knime v 3.5.3 and Knime v 3.6. I have the same error with a local PC with Windows 7 64 bit OS and with a powerful Windows Server 2016 64 bit.

Any other ideas?
Thanks
Regards

marten_kose · July 24, 2018, 4:47pm

Indeed, this looks like a bug to me as well. There seems to be an issue when domain values contain special characters. If you check the output port of the H2O Model to Mojo node, you see that the column domain values look somewhat deformed. We will continue investigating on this and let you know as soon as we know more.

Thanks for hunting this down!

iiiaaa · July 25, 2018, 7:52am

Thank you very much for keeping me updated! I hope this bug will be solved soon!
Regards

marten_kose · October 1, 2018, 10:04am

FYI: We opened a ticket for that https://0xdata.atlassian.net/browse/PUBDEV-5946

Until this will be fixed, we have to rename all columns and values with special characters.

iiiaaa · October 1, 2018, 12:19pm

Many thanks!