Multi-word stopwords

Hi

I'm having problems filtering multi-word stopwords using the "Stop word filter" node.

The stop word list I provide in my file contains stop words made of 2+ words, and the node does not filter them.

I simply separated each word forming a multiword with a white space, should I escape it somehow?

I tried using the \ before the white space, but to no avail.

In a reply to a forum discussion I saw a small comment saying that the "Stop word filter" should work with multiwords too, any idea why I am having problems?

Thank you!

Giacomo

Hi Giacomo,

the stop word filter can filter only single words (tokens). To filter multiple words you can use the Dictionary Tagger and provide the stop word list as data table (second input). Then use the Modifieable Term Filter node (or the General Tag Filter node) and filter out the tagged terms. Make sure to check "Ignore unmodifiability" in the filter node, since tagged terms are set unmodifiable by default.

Cheers, Kilian

Hi Kilian,

thank you for your answer; it works!

Giacomo