PMML Ensemble Predictor isn't calculating the right values

Morris_Kurz · March 28, 2019, 8:58am

Hello together,

while trying to use the PMML Ensemble Predictor in combination with the Tree Ensemble Learner, I noticed that the prediction output of the PMML Model is different from the output of the Tree Ensemble Learner. However this is only the case if String-valued columns are contained in the dataset. I’ve debugged the node a bit and it seems that the PMMLSetPredicate “IsNotIn” always evaluates to true, even if it is not true. As an example: The first split in the decision tree may be Sex = “MALE” or Sex=“FEMALE”. The saved condition for the first split is then “value IsNotIn [“FEMALE”]”. If the input now is Sex=“FEMALE”, then this should result in a false condition, but it does not. I’ve attached a sample workflow to showcase the problem. By the way, with the JPMML node, this problem does not happen.
PMML-Ensemble wrong prediction.knwf (3.4 MB)

nemad · March 28, 2019, 11:01am

Hello @Morris_Kurz,

thank you for making us aware of this issue, I’ll have a look at your workflow and initiate the necessary follow-up steps.

Kind regards,

Adrian

Morris_Kurz · April 10, 2019, 7:07am

It seems that the problem lies within the white spaces of the string variables. When I trimmed each string variable of the leading whitespace, I received the same predictions as the tree ensemble predictor. This is also a problem in the decision tree node (I tested this with the same data) and probably every other node which use the “PMMLDecisionTreeTranslator”. I hope this helps you to further investigate the issue.

nemad · April 10, 2019, 8:33am

Hello @Morris_Kurz,

thank you this helps a lot!

Best,

Adrian