Hi,
I used a dataset (setA) as training data and handled the missing data and then used column filter(PMML) on top of it. And when i used the pmml output which i got from column filter into pmml transformation apply operator for the test dataset(setB) to apply the operation which i applied on train, i cant able to get the filtered column which i used in column filter(PMML) for training. But it can able to handle missing values perfectly.
Can anybody column_filter_(PMML).knwf (18.4 KB)
please help me solve this problem?
Attaching the file for reference column_filter(PMML)_excel.knwf (18.0 KB) setB.xlsx (8.7 KB) setA.xlsx (9.8 KB)
The issue seems to be that the column filter (pmml) node writes the columns to the DataDictionary of the PMML model and this is not applied with the PMML Transformation Apply node.
Are you trying to capture the preprocessing steps in the PMML model so that you can apply it in a prediction workflow? If that is the case, I can highly recommend you to try out the Integrated Deployment nodes. An example can be found here: Integrated Deployment Example – KNIME Hub
With this it is much easier to automatically apply the preprocessing from the training workflow to the prediction workflow.
Unfortunately, PMML itself does not offer a column filter transformation. Therefore, it is not possible to apply it with the “PMML Transformation Apply” node.
@JoergWas, @nik_09 there is this discussion about PMML. Unfortunately the collection ofr PMML and models does not work as intuitively as one might hjope. But you still can use the transformations one at a time. At certain points you might have to do the filtering of transformed columns yourself I noticed: