Not being able to lemmatize some words: E.g. "clearer" and "faster" to "clear" and "fast"

Hello everyone,

I’m having some trouble lemmatizing some words using Stanford Lemmatizer.
Just in case, I’ve also tested using Snowball Stemmer but obtaining the same result.

As you can see in the image below, I can’t lemmatize the words “faster” and “clearer”. I expected “quick” and “clear” results respectively, but this didn’t happen.
On the other hand the procces is capable to convert “cleared” to “clear” correctly.

The right column is the original term, while the next two columns are the stemmed and lemmatized, respectively.

Do you know if it is a bug in the extension or what I am doing wrong?

Thanks in advance!

Hi,
I can confirm the behaviour. I also tested it with the command line tool mentioned here and that yields the correct results. So either we are using an older version of the lemmatizer or there really is a bug. However, the latter seems unlikely, as other words are lemmatized correctly. I will create a bug ticket for this.
Kind regards,
Alexander

1 Like

Hi Alexander,

I’m using the version 4.77. I will try updating KNIME and try again.

Thank you very much for your quick response!

Regards,
Mauricio.

Hi,
I’ve updated to 4.7.8 version and it still doesn’t work.
Which version have you tried the Lemmatizer, Alexander?

Thanks

As a workaround, it might also be useful to check out the newer Spacy Lemmatizer , as well - perhaps it would provide better results?

1 Like

Thanks for your advice, Scott!
I’ll try that extension.

Regards,
Mauricio

Hi Mauricio,
Unfortunately, even the newer versions of that extension do not integrate a newer version of the Stanford Lemmatizer library. It is on us to make that update and I created a ticket for our developers. But maybe Scott’s advice helps!
Kind regards,
Alex

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.