Extract tagged output from Stanford NLP NE Scorer

EricSchaap · May 18, 2020, 8:46am

Hello, I have only recently started to use KNIME for a text mining project. I have implemented the Stanford NLP NE Learner, Tagger and Scorer in my workflow and would like to improve the performance of the Learner by looking at its output. Hence, I would like to know how I can extract the tagged output from the Stanford NLP NE Scorer to compare the dictionary tagged and RegEx tagged datasets? Thanks in advance!

julian.bunzel · May 18, 2020, 9:18am

Hey @EricSchaap,

unfortunately the StanfordNLP NE Scorer only provides basic counts and scores. If you want to compare the performance based on the terms that have or haven’t been tagged, I would recommend to create a workflow to do so. Therefore you can use the Dictionary Tagger and the StanfordNLP NE Tagger (each used in one branch of your workflow) to tag your test documents. Afterwards you can count the number of occurrences of the terms for each tagging method and compare them.

I hope this helps, but I will also create and upload a small example shortly.

Best,
Julian

EricSchaap · May 18, 2020, 9:56am

Hi Julian,

Thank you for your quick reply. I think I grasp the idea of your approach but I will happily have a look at your example to make sure I get it right!

Thank you,

Eric

julian.bunzel · May 18, 2020, 1:53pm

Hey Eric,

here is the workflow I mentioned.
It contains a component that creates two tables. The first table contains a Bag Of Words and term frequencies for tagged documents (one column contains counts based on dictionary tagging and one based on model-based tagging), as well as columns for true positives, false negatives and false positives. This helps to compare whether the model tagged the words correctly. There might also be words that were not available in the dictionary and which are only tagged by the model. These would be false positives by definition, but we want to have these entities since they are the reason we created a model in the first place.

I can’t upload the workflow with my NE model right now (so you won’t be able to execute the workflow). I’d recommend copying the component in the workflow to your workflow. Connect your table containing the documents to the first input port, the dictionary you used to train the model to the second input port and the model coming from the Learner node to the third input port.

forum_NER_Model_Scoring.knwf (3.4 MB)

Cheers,

Julian

system · June 2, 2023, 9:42pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.