StanfordNLP NE Tagger - How to tag multiword terms?

Hey there,

I've been testing the StanfordNLP NE node, and I have a quick question, perhaps for a workaround. As below, the tagger identifies correctly New York as a location, or even Thursday morning as a expression of date/times

----------------------------------------------------------------------------------------------------------------------------------------------------
"event in New[NE(LOCATION)] York[NE(LOCATION)] Thursday[NE(DATE)] morning[NE(TIME)]"
----------------------------------------------------------------------------------------------------------------------------------------------------

But how could I aggregate terms togheter to represent a "single word". it would make more sense of course of having "new york" as a term rather than having  "new" and "york". I also understand that it's possible to create a dictionary with multiword terms and tag them, but this assumes I know them already - which I won't everytime.

So, is there any workaround for this case? For example, something like a rule where if you have a NE(LOCATION) + NE(LOCATION), to tag them as a single term? That would be super helpful.

 

Thanks!

Gustavo Velho

Hi Gustavo,

thank you for your question. You are right, by pointing out that "New York" should be considered as one term after tagging. This is what usually happens and what tagger nodes should do, also change granularity of terms. When you use the OpenNLP NE tagger it actually works and "New York" or "United States" will be tagged as one term (set NEs to find to Location in the dialog).

As a workaround use the OpenNLP tagger, or if you want to use the StanfordNLP tagger extract the tagged terms and combine following terms with the same tag to one term, which is then used by the Dictionary tagger. Attached is an example workflow.

We need to check why the StanfordNLP NE tagger is not aggregating terms during tagging.

Cheers, Kilian

 

Thanks Killian, your workflow is really helpful!

Regarding the nodes: the thing is that the StandfordNLP Tagger tags all NE at once (Location, Date, Time, Person...), on the other hand the OpenNLP Tagger does that only for one group at once.

Anyway, thanks for your help!

Gustavo

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.