Should POS Tagger Node came first Bag of Word, or viceversa?

text-processing
#1

Hello,

I was using those 2 nodes ( POS Tagger and Bag of Word Creator),
and I noticed that if the flow is Pos and then Bag of Word, the process does not need a lot of time,
but if the nodes are inverted, the time needed increase hugely.

Is there any explanation?
Which one should come first?

Thank you in advance.

0 Likes

#2

Hey @Tiziano,

using the POS Tagger and then the Bag of Words Creator is the correct way.
Some documentation pages still show the other way around but it’s outdated.

The Bag of Words creates a row with a term and the related document, so the number of terms in the document defines the number of rows. If you use the POS Tagger afterwards, it tries to tag all documents in the table while most documents are copies.

Cheers,

Julian

1 Like