Error in Stanford NLP Scorer

Hello,

I am testing out the Stanford NLP NE Learner with dates as listed below and a document column.
I have no problem in executing the Learner node but if I run the Stanford NLP NE Scorer then I get this error:

ERROR StanfordNLP NE Scorer 0:746:798 Execute failed: Argument array lengths differ: [class edu.stanford.nlp.ling.CoreAnnotations$TextAnnotation, class edu.stanford.nlp.ling.CoreAnnotations$AnswerAnnotation] vs. [September]

Below are some of the dates listed in my dictionary column … the only thing I found with September is “September 2017” and “28. September 2017”, but why is that a problem for the Scorer?
Btw, the Scorer does not get the same input of document data as the Trainer Node, but shouldnt be a problem … ?

September 2017
29. 01.2018
28.12.2017
30.12.2017
1.12.17
31.12.2017
26.01.18
Jan. 2018
31.01.2018
1.2.2018
22.01.2018
29.12.2017
15.12.2017
30.11.2017
31.01.2017
28. September 2017

Thanks in advance!

Jasmin

Hello Jasmin,

at first I want to say that I am new to the Knime platform, so please apologize if some of my answers are not right. I still want to report what I did with the StanfordNLP NE Learner because I experienced the same problems:

ERROR StanfordNLP NE Learner 0:57 Execute failed: Argument array lengths differ: [class edu.stanford.nlp.ling.CoreAnnotations$TextAnnotation, class edu.stanford.nlp.ling.CoreAnnotations$AnswerAnnotation] vs. [MyTerm]

What startled me was that the term shown in the log was not complete, just like in your case:

Your term is: “September 2017”, but the error message shows “September”.

I replaced all occurences of spaces (" “) with underlines (”_").
Now the NE Learner does not cancel anymore, but the term I want to search for has spaces in it, so it is not quite right. I don’t know if it’s working now. I am using several hundred documents to train the learner with, so this could take quite a long time now :slight_smile:

Did you find out something in the meantime?

In your case it could be a good idea to convert all dates from MMM YYYY or DD.MM.YYYY to i. e. YYYY-MM-DD, so there would not occur spaces anymore. You would also have to convert all occurences of dates in the documents to the same format as well to match them. This would have to be a preprocessing step for your documents before learning or scoring them with your model.

I think in my case I would have to preprocess the documents and search and replace all “MyTerm ABC” with “MyTerm_ABC” for the same reason.

Kind regards

Oliver

P. S. Interesting that you are only experiencing that with the scorer and not with the learner…

Hey,

Sorry for the late response. I will have a deeper look into this issue tomorrow.
I think, it is indeed a problem with whitespace separated entities.
However thank you very much for reporting and again sorry for the late response.

Regards,

Julian

Hello again @jngo

I could not reproduce your problem. Which version of KNIME do you use? Can you provide a small subset of sentences from the training and test data that would still cause this error?

Additionally, as @oli-ver said, try to avoid entities in your dictionary that are separated by whitespaces, because they will be broken in to two or more entities anyway and this might not give the desired result.

Cheers,

Julian

Hello @oli-ver and @julian.bunzel,

thanks for your help!
I have a solution now! I found out that the error is not caused by the whitespaces but by the line break. So if 1. September 2018 is somehow separated by a line break the node will show an error. To avoid this I used a regex replacer to replace all line breaks with a whitespace and then substitute two or more whitespaces with only one whitespace.

Thanks anyway :slight_smile:

Jasmin

1 Like

Hey Jasmin,

good to know that you could solve your problem.
However I will have another look at the problem, because it is not supposed to fail.
Thanks for reporting. :slight_smile:

Cheers,

Julian

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.