Position of POS Tagger in Text Mining Workflow

mgroos · December 4, 2022, 3:18pm

Hi there!

In the arrangement of the workflow for text mining, the POS Tagger is always placed before Preprocessing as Enrichment. Why is that and why is it not placed after the Preprocessing (Case Converter, Number Filter, Stop Word Filter, Punctuation Erasure)?

Thank you for you help!

Martyna · December 5, 2022, 11:40am

Hi @mgroos

I think from the technical perspective you can do both ways. It is more the question of what kind of text mining analysis you are planning to do. I guess using the POS tagger before or after the preprocessing nodes can have advantages/disadvantages, depending on what you want to achieve.

Best regards,
Martyna

Daniel_Weikert · December 5, 2022, 6:10pm

Does the Dictionary Tagger only work when the tag is in the title?
I tried the node and it only picks up the tag if its in the title not in the text of the doc itself?
br

Martyna · December 9, 2022, 9:46pm

Actually not, it should be possible to do with the whole text too. Do you have a simple example workflow that could be shared?

system · March 9, 2023, 9:47pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.