I have a parsed and preprocessed/cleaned pdf document corpus where want the wildcard tagger to tag n-grams (bigrams) throughout all document in the corpus. I use a customized sustainability based dictionary that uses 1 word up to 4 words tokens (e.g. “impact on local communities” → tag “Social_and_community”). How can I in general do that n-gram tagging?
Thanx a lot in advance!
it should work as you have described it. You can simply pass the n-grams as a dictionary to the Wildcard Tagger or Dictionary Tagger node and it will basically merge the words to a multi-word term and tag it with the given tag value. If you want to check it afterwards, you can either use the Bag of Words node or the Unique Term Extractor node.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.