Hello, dear community!
Junior Knime user here but with big questions (I think).
I’m in analytics and my main reason is to be able to extract entities and relations from unstructured Romanian text and after that, to link them together.
For example, I have this text in English:
Maria is driving a red Mercedes and Josh is driving a black Saab.
My goal is to create nodes that:
- Are able to extract the entities: Maria, Josh, Mercedes and Saab.
- To extract the relations between the entities: is driving.
- To be able to create an excel file with this configuration: in Column1 to place the first entity (Maria), in Column2 to place the relation (is driving), in Column3 to place the brand of first car (Mercedes), in Column4=Josh, in Column5=is driving, in Column6=Saab.
The reason why I want this format is because, after I’ll export the resulting excel file, I will use Analyst’s Notebook that will help me linking the entities (with a line, like in neo4js).
By using the SpaCy nodes like ModelSelector, Tokenizer, NER, Bag of words and some Regex nodes, I am able to extract the entities from Romanian text.
But I am far from the desired result.
So, I have some questions to this wonderful community:
Regarding spaCy model selector node, I saw that it’s possible to load another local model. Do you know some websites from where I can download another large scale model, preferably for Romanian language?
Regarding Stanford nlp relation extractor, I saw that accepts only English. Can you recommend another relation extractor that accept Romanian language also? Or maybe can I train another note to do it?
Regarding offline LLMs, do you have some recommendations about what’s the best model suited for nlp, ner and relations, that I can use inside knime?
Also, can you share with me another approach in solving the main question regarding entities and relations between them?
Thanks!