Punctuation Erasure, Number Filter, and Case Converter not really work


I have a workflow like this:
Strings to Document --> Punctuation Erasure --> Case Converter --> Number Filter --> Snowball Stemmer --> Stop words Filter --> BoW creator

When I run it the BoW still includes many punctuation signs and numbers. Also, many terms still have upper case letters. I removed Punctuation Erasure, Case Converter, and Number Filter from the workflow and it gave me the same BoW. So it sounds like those are not working even though they are returning any error. Am I missing something?


Hi @behrooz12 -

In each of the nodes you are using, check carefully to see which Document column you are applying them to. Sometimes users will apply certain nodes to the Document column, and others to the Preprocessed Document column, which will create problems. If you have consistently applied your transformations to the correct column, it should work.

If you’re still having trouble, feel free to post an example workflow and we’ll see if we can identify the issue.

1 Like