I’m getting the following error when using the StanfordNLP NE Learner:
Execute failed: Got NaN for prob in CRFLogConditionalObjectiveFunction.calculate() - this may well indicate numeric underflow due to overly long documents.
I’m using a slightly modified version of the ‘NER Tagger Model Training’ example to change the input and the dictionary.
Any idea what represents an ‘overly long document’ in this context?
I will try removing the longer documents and trying again but it would help to know if there is a hard limit on document length and what it is.
Here is another one:
ERROR StanfordNLP NE Learner 2:8 Execute failed: Argument array lengths differ: [class edu.stanford.nlp.ling.CoreAnnotations$TextAnnotation, class edu.stanford.nlp.ling.CoreAnnotations$AnswerAnnotation] vs. [Red]
What I’m trying to do is create a model to extract names from technical documents…so the language is not particularly natural. I’ve extracted a bunch of names using the out of the box Stanford taggers and now I’m looping these back to train a model on the source documents to see if I can improve the yield.
The second error looks like it could be triggered by non-ASCII characters in the input documents. Running a regex replace to convert them to a single space seems to resolve the issue.
The first error seems to be content rather than document length driven. Restricting length does not reliably remove the issue.