H2O Pojo to Mojo - Unknown categorical level Error

iiiaaa · July 26, 2018, 2:42pm

Dear all,
As you can see in the workflow attached using H2O Pojo model it is possible to learn on some categories (and store the model), and then predict on new categories not seen in learning phase using the stored model.

But unfortunately this is not possible with the Mojo model (ERROR Unknown categorical level). This makes the Mojo model useless on new data. And the traditional POJO model is too slow to be used in production.

Could you do something to enable the Mojo model to be used on new category like the POJO model?

Thanks in advance
Regards

H2O Pojo to Mojo_v5.knwf (243.5 KB)

marten_kose · July 26, 2018, 3:55pm

This can be done by unchecking Enforce presence of all feature columns and Fail, if a prediction exception occurs in the node dialog of the H2O Predictor (Regression) node. However, you’ll have missing values in your predictions for rows with unknown categorical data.

iiiaaa · July 27, 2018, 6:56am

Dear Marten,
thank you very much for your answer. That’s the problem. With Pojo you don’t have missing values because it converts Unknown Categories in missing values in the INPUT, forcing the model to use the other variables to provide a prediction.

It would be very good to also allow the MOJO model to do the same, i.e. to convert out of domain categories into missing forcing the model to use the other variables to provide a prediction.
Do you think it would be possible to integrate this feature?

Thanks in advance
Regards

venturaz87 · July 27, 2018, 7:00am

Hello, I completely agree. I have the same problem and it is very annoying. It would be great if Mojo could mimic the behavior of Pojo in this case. I am dealing with millions of records with plenty of features and Mojo is my only chance to make my algorithm works in a decent amount of time, but this issue is really blocking me.

Thanks!

Working · July 27, 2018, 10:04am

Hello,

I think it would be very interesting to allow MOJO to have the same behaviour of POJO.
By the way, this is possible in H2O via Python.
Do you think it would be possible to integrate it in Knime?
Thanks a lot in advance

marten_kose · July 30, 2018, 4:50pm

Thanks everyone for the input. We will come up with a solution that allows some more advanced settings in the MOJO predictor.

However, for the time being, you might want to use a workaround. What you could do is to check all the different nominal values present in your training data and if your prediction data contains unknown categorical levels, you can replace them with missing values. This might not be the most elegant way, but should do the trick for now.

I’ll keep you posted as soon as something is available here.

iiiaaa · July 31, 2018, 7:42am

Many thanks for your reply. Ok, very good. Looking forward to see the new implementation!
Regards

SimonS · July 11, 2019, 12:20pm

Good news! We added the option to treat unknown categorical values as missing values to the MOJO predictor nodes. It will be available with the next major release (4.1.0) but you can already use and test it with the latest nightly build (https://www.knime.com/form/nightly-build).

Cheers,
Simon