I have a categorical feature to do One Hot Encoding, and I tried using One to Many node. It works fine for training and evaluating, but I couldn’t figure out how to “apply” it to new data.
Let’s say my feature has (on training set) 4 distinct values: A, B, C and D. I use One to Many node to generate 4 columns, then I train a model on it.
My pipeline for new data should perform the same transformation, so I have the same 4 columns to feed my trained model.
But if I use One to Many on new data, let’s say I have a single row of data with value C on my feature. I will be missing columns A, B and D. How can I make sure the corresponding columns are created in order to feed the model?
I am relatively new to KNIME, so maybe I am missing something…
When do the one-hot-encoding you can write your “formula” into a PMML. You can reuse the PMML by PMML Transformation Apply node. I created an example wf 1_to_may_pmml.knwf (24.1 KB) . As you see the value “green” is missing in the second table , the PMML Transformation node creates a column “green” but only with values “0”.