StanfordNLP NE Learner examples

Hello. Do you have any examples for using StanfordNLP NE Learner examples?

I've been trying to use the dictionary for the trainer unsuccessfully.

Thanks!

 

Hi peleitor,

there will be an example workflow on the example server soon.

Which kind of problems did you encouter?

Cheers, 

Julian

Probably I am not understanding the idea correctly. I expected to train based on annotated documents, but when I see the input parameters of the Stanford NE Learner node, I find input documents as first parameter and a list of entities  as the second one. The trained model did not perform NER as expected. But again , maybe I am not proceeding correctly.

Thanks

 

Hi Peleitor,

we uploaded now an example worklfow from Julian on the example server.

knime://EXAMPLES/08_Other_Analytics_Types/01_Text_Processing/14_NER_Tagger_Model_Training

The node takes documents and a dictionary as input. The documents are then annotated by the node. All names of the dictionary will be annotaed in the documents. These annotated documents are then used to train the NER model. Under the hood the Standford lib is used. The StandfordNLP NE tagger node can then use the trained model.

Does that help?

Cheers, Kilian

Thanks Kilian, that's a nice example.

I still don't get the point on this: the training set should have annotated entities inside the documents, but I see in the example that it is assumed that all entries in the dictionary will be the entities (I don't see annotations at a per-document level, like you would do on an annotation tool like Brat). But I guess this is the way it works. 

On the other hand, I need to classify entities for certain types (genes, proteins, aminoacids, etc.) instead of labling all of them as "UNKNOWN". Is there any means of modeling this?

 

Thanks again, 

Fernando

 

Hey Fernando,

the learner node annotates the documents internally, based on the entities defined in your dictionary.

If you already have a table with annotated documents, you have to extract the terms that are annotated with a specific tag and create a dictionary based on them, since the learner cannot be used with annotated documents only. 

In the "Learner Options" tab, you can define the tag (e.g. CELL(PHARMA)). The tag will be saved in the model content, so the Stanford NE tagger node knows which tag to use. 

If you want to create your own tag sets, please have a look at this:

https://tech.knime.org/for-developers-integration-of-custom-tag-sets

At the moment, the learner generates single class models. If you want to classify different types (like genes, proteins, etc.) you have to generate a model for each class.

Cheers,

Julian

Thanks Julian, that clarifies it.

I’d like to re-engage this thread, as I’m having trouble using this family of nodes effectively. My understanding of an NER model is that using a training set of known words within sentence constructs, a predictor can uncover similar or alternate words based on the context of an unseen data set.
However, it appears the StanfordNLP NE Learner takes a corpus of untagged documents and performs simple word matching from a user defined list (while also expecting the user to select an arbitrary tag type and tag value from the drop down list). If this is the case, I fail to see how this is different from Wildcard Tagger, Dictionary Tagger, or any string matcher. Am I missing something?