I need some guidance regarding text document analysis in Knime. I’ve explored various techniques for analyzing text, particularly utilizing the Spacy library for recognizing names, entities, and more.
This task is something I’m keen to pursue, but with a focus on specific data such as drug names, laboratory names, and disease names. Is it viable to train a custom model using Spacy to identify these specific entities, utilizing a dataset containing text and associated labels?
Moreover, I’m interested in generalizing the model to identify new matches beyond the training dataset. While I’m aware of achieving this through Python code, I’m unsure if the Redfield NLP Node extension offers similar capabilities.
I appreciate any assistance you can provide on this matter.
I came across this workflow that I believe could be helpful for you (NER Tagger Model Training – KNIME Community Hub). It’s a bit updated, but it still presents all the important steps for performing a Named Entity Recognition (NER) task.
This workflow trains a model to recognize proper nouns in a book from Julius Caesar. While your application is obviously different, the underlying concept remains the same. Essentially, you provide a dictionary of words that are part of the training set, and the model learns how to identify new ones based on a rule-based approach.
Thank you for getting back to me. I’ve carefully reviewed the workflow, and it fits exactly what I have in mind.
However, I’d like to utilize the Redfield NLP Nodes extension, which I find intuitive for visualizing the extracted informations. When attempting to integrate the model into the Spacy NER node, I encountered an incompatibility issue.
I’ve also explored the Spacy Model Selector node, which allows for the implementation of an external model. However, it doesn’t seem to recognize the model, which I suspect must be in a folder with a specific file structure.
In the end, I’m not sure if it’s possible to easily integrate a model into the Spacy NER node. I’ve looked at several examples that use this node, but I haven’t found a solution that integrates a Knime model format.
I’m not sure if you’re able to provide an answer regarding this specific extension, but thank you for your help.
There is no way you can train or fine-tune the Spacy model in Knime. The nodes are created to only use existing tables. Moreover the nodes are not compatible with Model Reader, since there is a dedicated node called Spacy Model Selector.
Regarding your use case of NLP for biomedical data you can try to use this repository and get models from there:
Then in the settings of Spacy Model Selector you can provide a path to the model you downloaded.