Problem Saving and Loading PMML of Decision Tree

Using the decision tree example, I was able to adapt it to use my own data. I am trying to predict a true / false value based on 2 different fields.

I got it working and it was fine using 5000 learning rows, and 1000 test rows. (72% accuracy)

I then wanted to feed in a manual value for the two fields to get the prediction. I saved the PMML output from the Decision Tree Learner node, created a new workflow, loaded it using a PMML Reader but was unable to get it to work. Every time I feed in my own data using a Table Creator node i get this error : “Learning column “Office” not found in input data to be predicted”. The table does have a field for Office, and its in the same column location as the test and learning data.

How can i use the PMML model I have generated to make a prediction based on a manual input?

Hi,

Try “Table Validator (Reference) node” to detect possible problems related with the table you have created with “Table Creator” node. In your case, table to validate (upper input port) will be your manually created table and the table containing training/test set will be used as reference table (lower input port). Play with the settings in the “Table Validator (Reference) node”, you may also connect activated upper output port of the node to the predictor.

Martin K.

1 Like

Hi @leinad13,

Using your own generated test data with the Table Creator node should work. Try to check not only the column names but their types also. You can try to use the Table Validator (Reference) node as @Martin_K suggested, comparing the original train data and the output of the Table Creator node.

Best,
Anna

Thank you for your help @amartin, I was able to enter data manually and get a prediction using the Table Validator node. I had to enable the ‘Sort them to the end’ option for ‘Handling of unknown columns’. This moved the ‘Office’ column to the end of the table and kept the value in it. The value in the original ‘Office’ column was replace with a ?.

I’m still not sure what was wrong with my input data, as far as i can tell - its the right data type and everything should have matched, but i’m glad i found a solution.