How to handle missing column when using PMML

kilian.thiel · October 26, 2015, 6:11pm

Hi Sudha,

the Keyword Extractor node might not be the best node to extract keywords as features for classification. This is not the best example on the text processing website, sorry for that. Please see the sentiment blog article https://www.knime.org/blog/sentiment-analysis which describes a better way to extract terms from documents for classification.

One thing that you have to keep in mind when you process text and train a model on that data is that you have to create the same feature space for another set of documents that you want to use on the trained model. This is somehow a bit of work. You need to filter all features (columns) that are not in the training data set, apply the same preprocessing steps and append features that are in the training set but not in the second set.

Attached is an example workflow in which a first set of documents is used to train a model and a second set of documents is than used on the model. I hope this helps.

Cheers, Kilian

adjustingfeaturespace.zip