How to export all found named entities to a list with duplicates?
This is the situation:
I read in the data from a txt file. Then convert with Strings to Document after this I use the Wildcard Tagger to find named entities. In the interactive mode of the Document Viewer I highlighted the result with tags on and selected NE in the search the result is like word1[NE(…)], word2[NE(…)], word3[NE(…)] so far so good.
Now I want to export these found named entities to a list. How do I solve this?
after using the Wildcard Tagger, you can use the Modifiable Term Filter, which removes all untagged terms (tagging makes terms unmodifiable, if selected in the tagger node dialog). Then you can use the Bag Of Words Creator to get a list of all terms which are occurring in the documents. If you want to have the entities as Strings, you could use the Term to String node afterwards and remove duplicates with the GroupBy node.