Hi! Anyone knows where does this weird term come from? It appears in my BoW term list and I tried to locate the original word in the textsource but couldnt find anything leading to this. (The preprocess nodes I used upstream were number filter, punctuation erasure, diacritic remover and Nchars filter, in that order.)

You could check which standard encoding has been set for your KNIME installation.

Typically it should be utf-8. And of course you might check if these terms appear somewhere in your source data and what encoding they have.


Thanks so much! After I followed your solution, upon relooking the new BoW terms, that one line appears to be chinese characters in the original text. What a surprise, I didnt expect P&G’s report to contain non-Latin characters!

Thank you again!


