I am testing out the Stanford NLP NE Learner with dates as listed below and a document column.
I have no problem in executing the Learner node but if I run the Stanford NLP NE Scorer then I get this error:
ERROR StanfordNLP NE Scorer 0:746:798 Execute failed: Argument array lengths differ: [class edu.stanford.nlp.ling.CoreAnnotations$TextAnnotation, class edu.stanford.nlp.ling.CoreAnnotations$AnswerAnnotation] vs. [September]
Below are some of the dates listed in my dictionary column … the only thing I found with September is “September 2017” and “28. September 2017”, but why is that a problem for the Scorer?
Btw, the Scorer does not get the same input of document data as the Trainer Node, but shouldnt be a problem … ?
at first I want to say that I am new to the Knime platform, so please apologize if some of my answers are not right. I still want to report what I did with the StanfordNLP NE Learner because I experienced the same problems:
ERROR StanfordNLP NE Learner 0:57 Execute failed: Argument array lengths differ: [class edu.stanford.nlp.ling.CoreAnnotations$TextAnnotation, class edu.stanford.nlp.ling.CoreAnnotations$AnswerAnnotation] vs. [MyTerm]
What startled me was that the term shown in the log was not complete, just like in your case:
Your term is: “September 2017”, but the error message shows “September”.
I replaced all occurences of spaces (" “) with underlines (”_").
Now the NE Learner does not cancel anymore, but the term I want to search for has spaces in it, so it is not quite right. I don’t know if it’s working now. I am using several hundred documents to train the learner with, so this could take quite a long time now
Did you find out something in the meantime?
In your case it could be a good idea to convert all dates from MMM YYYY or DD.MM.YYYY to i. e. YYYY-MM-DD, so there would not occur spaces anymore. You would also have to convert all occurences of dates in the documents to the same format as well to match them. This would have to be a preprocessing step for your documents before learning or scoring them with your model.
I think in my case I would have to preprocess the documents and search and replace all “MyTerm ABC” with “MyTerm_ABC” for the same reason.
Kind regards
Oliver
P. S. Interesting that you are only experiencing that with the scorer and not with the learner…
Sorry for the late response. I will have a deeper look into this issue tomorrow.
I think, it is indeed a problem with whitespace separated entities.
However thank you very much for reporting and again sorry for the late response.
I could not reproduce your problem. Which version of KNIME do you use? Can you provide a small subset of sentences from the training and test data that would still cause this error?
Additionally, as @oli-ver said, try to avoid entities in your dictionary that are separated by whitespaces, because they will be broken in to two or more entities anyway and this might not give the desired result.
thanks for your help!
I have a solution now! I found out that the error is not caused by the whitespaces but by the line break. So if 1. September 2018 is somehow separated by a line break the node will show an error. To avoid this I used a regex replacer to replace all line breaks with a whitespace and then substitute two or more whitespaces with only one whitespace.
good to know that you could solve your problem.
However I will have another look at the problem, because it is not supposed to fail.
Thanks for reporting.