In using a dictionary tagger, some rows have multiple tags. In my case, most of the time, only the last two tags are necessary to continue with my workflow. Too many tags will confuse the nodes downstream. Thus, I'd like to know if someone has made a way to retain only the last two tags after using dictionary tagger.
Thanks for the help. I'm really stumped on this one. I'm also new to text processing so I'm sure I missed out something.
there is the Tag Stripper node that removes all tags, but I think this is not want you are looking for. Removing only the last two tags is not possible. Why do you use the Dictionary Tagger node if you want to remove the tags anyway? What about skipping the tagger nodes?
It's just that the last two tags tend to be the more "correct" tags or the most useful for my case. The rest of the tags will just be noise.
The closest example I could imagine is in processing residential address data. When you want to get the top-level administrative levels like county or state, only the last two tags will matter. An address like "731 Lexington, New York City, New York" might be tagged as "Lexington New York New York", with a dictionary of counties and states. Yet, in this case, I only need "New York New York", because the "Lexington" tag will point me to a city in Kentucky.
I know that I "can" have cleaner data but it will not always be the case since the address field and the users are not under my control. I'm just tasked to use the data and get the most I can from the dirty input.