Invert a Document Filter

Is it possible to have a dictionary filter to return only rows 1 or more dictionary words were found

Maybe I'm using the tool incorrectly and I need to preprocess my data differently or use a different tool?
What I'd like to end up with is a table containing documents where the keyword(s) were found and then to process the documents further.

Would there be a recommended node I should use ?

You can use a combination of Dictionary Tagger, to tag your search terms as they appear in each document, and Tag Filter to extract/list them as a saparate column.

Then you can simply filter on the rows (documents) where the tags appear using a Row Filter set to exclude rows with value "" for the extracted tags. This will leave you only with the documents that contain your original dictionary terms.

Cheers,
Marco.

 

Thanks that did the trick

I don’t suppose there could be an option to invert the Dictionary Filter? Dictionary tagging can take a long time if the collection and/or the tagging dictionary is large. Interestingly, I found this page (which is presumably very old) on a third-party site that suggests that the Dictionary Filter might have once had this capability.

Of course another option is to make a complete bag of words and then filter it to the dictionary, but this too can take a long time - presumably it would be much faster to filter the documents first.

Hi @AngusVeitch1 -

I recently created a ticket for exactly this issue (AP-15433). I will add a +1 from you on it.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.