Position of POS Tagger in Text Mining Workflow

Hi there!

In the arrangement of the workflow for text mining, the POS Tagger is always placed before Preprocessing as Enrichment. Why is that and why is it not placed after the Preprocessing (Case Converter, Number Filter, Stop Word Filter, Punctuation Erasure)?

Thank you for you help!

Hi @mgroos

I think from the technical perspective you can do both ways. It is more the question of what kind of text mining analysis you are planning to do. I guess using the POS tagger before or after the preprocessing nodes can have advantages/disadvantages, depending on what you want to achieve.

Best regards,
Martyna

2 Likes

Does the Dictionary Tagger only work when the tag is in the title?
I tried the node and it only picks up the tag if its in the title not in the text of the doc itself?
br

Actually not, it should be possible to do with the whole text too. Do you have a simple example workflow that could be shared?