I used Decision Tree Learner to create a model and followed it with PMML Writer.
The Column Filter shows the state of the dataset just prior to the Decision Tree Learner.
Then I used PMML Reader to read the PMML file. The PMML Reader node executes without error.
Then I attach either the PMML Classifier or the JPMML classifier and I get error messages indicating that fields not present in the data flow at the time of Learning are required to apply the model to new data. The error message indicates that an column that was not in the flow at the time of Learning , and also not selected by the Learner, is needed to apply the model to new data!
Decision Tree Prediction failed. Could not find attribute 'Risk_Score'
ERROR JPMML Classifier 0:3 Execute failed: The column Risk_Score does not exist in the table
Looking at the Decition Tree Learner, the PMML shows that only 4 input variables and one output variable were needed.
Yet the PMML Reader version includes many more fields.
So it looks to me like the PMML was written out to include both columns that were filtered and columns that were not used by the model, and that all of this junk is now required to deploy the model. Is this an accurate interpretation?
If so, what comes to my mind is that I would need to build a model, write out the winning data to disk. Read it back in, retrain the model, and then write out the PMML that does not know about other fields in the file.
Surely there is some easier way? Can you help me see what I am missing?