Stanford Lemmatizer - Relating original terms with their lemmas

mpenalver · October 4, 2021, 1:11pm

I need to keep a mapping of each term in a document to its lemma produced by the Stanford Lemmatizer node. So far, I haven’t found a straightforward way to do this. The lemmatizer node creates a new document with the lemmas in place of the original terms, and the Bag Of Words Creator node groups equal terms into a single entry, so I can’t establish a correspondence between original and lemma terms from their order in that node’s output table because several original terms might have the same lemma, which will appear only once.

I intend to create a custom node that will blend the two nodes above, applying lemmatization to a document’s POS-tagged terms and outputting two columns: the original terms (with their tags) and their corresponding lemma (with the same tag). Before doing so, though, I would like to know whether someone has an idea to achieve this goal using existing nodes.

ScottF · October 13, 2021, 9:19pm

Hi @mpenalver -

This is an interesting question. I can’t think of a good way to do this with existing nodes (although maybe I am missing something @julian.bunzel?) so creating a custom node seems like a good solution.

It certainly would be nice to have an easy way to see which terms have been lemmatized!

mpenalver · December 9, 2021, 2:51am

In the end, it was not necessary to create a custom node. The simple workflow attached calls the Stanford library directly to obtain the lemma associated to each input term.

relating_terms_with_their_lemmas.knwf (19.2 KB)

ScottF · December 9, 2021, 3:26pm

@mpenalver Thanks for posting your solution. A short snippet saves the day!

Maybe you would consider packaging this functionality as a component and posting it on the KNIME Hub?

mpenalver · December 10, 2021, 4:51am

Thanks for the suggestion, @ScottF, but the solution seems too simple for a full-fledged component… The core of it is a single call to the Stanford NLP library.

system · December 17, 2021, 4:52am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.