PMML exported using One to Many (PMML) is invalid

Harigae · May 12, 2017, 6:17am

I tried to predict using PMML that exported using One to Many (PMML), but doesn't works with external predictor.(e.g. JPMML, JPMML-Spark)

I think this is because of the fields that One to Many to applies are not defined as MiningField, So it can not get the value to apply.
This behavior occurs in all Transformation Nodes (e.g. Numeric Binner).

Quoting the use of PMML 4.2:

> All data entering a model must pass through the MiningSchema.
http://dmg.org/pmml/v4-2-1/MiningSchema.html

Also, as an implementation of JPMML, it is assumed that the field to which Transformation is applied is defined as MiningField.

Thank you.

AlexanderFillbrunn · May 30, 2017, 10:08am

Hi Harigae,

I think you are right. The column names in the MiningSchema correspond to the old columns and not the ones with the appended "*". I will have a closer look at it. In the meantime you could just append a star to all columns that are transformed and end up in the mining schema with their original name. When KNIME appends a star to a column name, this means it is replaced. This workaround was necessary because the PMML standard does not support replacing columns.

Kind regards,

Alexander

AlexanderFillbrunn · May 30, 2017, 12:04pm

Hi,

I looked at it again and I am not sure anymore if we really do it wrong. According to the PMML Field Scope document (http://dmg.org/pmml/v4-3/FieldScope.html) the mining schema only contains the fields from the data dictionary and the transformations are applied afterwards. This is how we are doing it right now. What do you think?

Kind regards,

Alexander

AlexanderFillbrunn · May 30, 2017, 12:09pm

Hi,

can you send me a workflow that demonstrates the problem? It only has to generate the faulty PMML, I can feed it into JPMML myself. With a simple workflow I just created it worked just fine.

Kind regards,

Alexander

Harigae · June 1, 2017, 12:38pm

Hi, Alexander

Thank you for reply.

As you can see from the attached PMML, the protocol_typ field is not included in the MiningField,
This is the field to which "One-To-Many" is applied.
Referring to the implementation of JPMML, the field to which transformation is applied needs to be included in Mininig Field, Threfore the fields of tcp, udp, icmp of ClusteringField are treated as missing.

Also, I have attached workflow.

Thank you.

k-means-with-transfomations.zip