I am working on a project where we need to generate a PMML file from java-spark and then that should be consumed in KNIME. In the latest version (that I used first) of spark & jpmml were:
1.5.13 (It is a jar which lets us store the spark-pipeline into a .pmml format)
Things worked fine from the java side, I was able to generate a pmml file. The problem raised when I tried to import that file in KNIME.
The file generated by java was in PMML version 4.4 and KNIME supports 4.2.
So I tried a hack, to manually change the version from 4.4 → 4.2 in the header and that worked for some basic models like linear / logistic regression. But failed on trees model or other complex models like SVM etc on KNIME side giving some: Evaluation error. (Not putting errors here - will do if required.)
I looked for the spark and jpmml version which could possibly generate in 4.2 version of pmml file and I found some of them but they were giving ClassNotFoundError on the java side because of some legacy issue.
Now, the reason I am writing this topic here is that: Is there any way to convert my 4.4 version into 4.2 so that can be acceptable by KNIME? or there is a possibility from KNIME to support 4.4 PMML files?
Looking forward to the positive response.
We have an internal ticket with number AP-8449 to upgrade the PMML version we support to 4.3, but now that 4.4 exists, it probably makes more sense to go directly for that. There is no timeline for implementation, though. I have pinged the developers about it. In the meantime, you probably need to look at the changelog for PMML 4.4 and 4.3:
Check if there is anything mentioned that you have in your PMML documents and if you do, adapt the document accordingly. I think this is unfortunately the only way to solve this right now.
Thank you very much @AlexanderFillbrunn !!
Since it is clear that currently there is a way but still manual, things are more clear now.
And just to ask another question if you don’t mind - that I want to add specific and additional information in the PMML file like: avg(column1), avg(column2) etc from the training dataset so that kind of information can be used on the other side where PMML is consumed.
So my question is: Can we add any additional information in PMML file in general and not specific to KNIME that is extra in nature and might not be related with the model so that it can be used on the another side?
If that becomes possible, somehow - it would be much more helpful.
Maybe you can insert an annotation element in the header using a String Manipulation node? There is no node to add that info directly, but you can treat the PMML as a string and then turn it back into PMML with the String To XML and XML To PMML.