Hello, I have only recently started to use KNIME for a text mining project. I have implemented the Stanford NLP NE Learner, Tagger and Scorer in my workflow and would like to improve the performance of the Learner by looking at its output. Hence, I would like to know how I can extract the tagged output from the Stanford NLP NE Scorer to compare the dictionary tagged and RegEx tagged datasets? Thanks in advance!
unfortunately the StanfordNLP NE Scorer only provides basic counts and scores. If you want to compare the performance based on the terms that have or haven’t been tagged, I would recommend to create a workflow to do so. Therefore you can use the Dictionary Tagger and the StanfordNLP NE Tagger (each used in one branch of your workflow) to tag your test documents. Afterwards you can count the number of occurrences of the terms for each tagging method and compare them.
I hope this helps, but I will also create and upload a small example shortly.
Thank you for your quick reply. I think I grasp the idea of your approach but I will happily have a look at your example to make sure I get it right!
here is the workflow I mentioned.
It contains a component that creates two tables. The first table contains a Bag Of Words and term frequencies for tagged documents (one column contains counts based on dictionary tagging and one based on model-based tagging), as well as columns for true positives, false negatives and false positives. This helps to compare whether the model tagged the words correctly. There might also be words that were not available in the dictionary and which are only tagged by the model. These would be false positives by definition, but we want to have these entities since they are the reason we created a model in the first place.
I can’t upload the workflow with my NE model right now (so you won’t be able to execute the workflow). I’d recommend copying the component in the workflow to your workflow. Connect your table containing the documents to the first input port, the dictionary you used to train the model to the second input port and the model coming from the Learner node to the third input port.
forum_NER_Model_Scoring.knwf (3.4 MB)