Stop Word Filter differences between "can't" and "Can't"... (not working with some)

Hi, I’m using the Stop Words Filter with a custom list. I want to delete some verbs. The Stop Words Node only works for some cases and I don’t understand how. I import both the xls (source of comments) and the .txt file (stopwordlist) with encoding UTF-8 and I also tried with US-ASCII.
Here the outputs:

Here a snapshot of the stop word list:

FYI, i use the same apostrophe for Can’t and can’t. Interestingly as well is how “I’ve” and “i’ve” are not filtered in any case… worst: “An’t” and “an’t” work!!! (i just copy pasted from can’t, deleted the c, and for one replaced “a” with “A”)

I tried with both check in “case sensitive” and not, with the same results…

Does someone have an idea why this could be happening?

ps. Dictionary Filter Node behaves the same:

postStopWords.txt (72 Bytes)
Test.txt (597 Bytes)

Use OpenNLP WhitespaceTokenizer in the Strings To Document node and it works as you expect.


Thanks! it works now.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.