Stop words not stopping.

I’m filtering out single words: conjunctions, def. article, etc - or at least this is the theory. Trouble is the results are still peppered with ‘if, or, the, and, then, that, but’ (etc) - I’m just processing simple twitter data that I don’t want the stopwords (clutter) in - well, again, I would be if the stopwords node worked.

I’m using it on a single field (column filtered). Still doesn’t work. Using it on the document, the processed document - it all returns the same cruft.

On the latest build of Knime & plugins/extensions. It’s nothing outlandish - hell, I got this to work in Orange - just not working in Knime. What might I be doing wrong?

Hi @Byron05,
could you share a simple workflow that demonstrates the problem? For me it works without problems. Please see the attached workflow for a simple example. Be aware that by default the node creates a new column with the filtered documents. If you don’t want that, you need to change the corresponding setting in the first tab of the config dialog (when you open it, the second tab is selected).
Kind regards
Alexander
stopwords.knwf (9.0 KB)

1 Like

I’m not sure what the problem was. This test (attached) works fine: One branch with and one without stop-words, also trying external versus internal stop-words lists - it all works now - I’m not in the office this morning but I’ll re-test on the workflow that was being problematic. The previous workflow wasn’t dissimilar except I wasn’t stripping the URLs. Firewall & VPN issues here in China so I hope this uploads.Twitter_stopword_regex_test_NOKEYs.knwf (35.2 KB)
stopwords.txt (634 Bytes)

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.