As I need text mining as a tool for my thesis, I don't have much experience with KNIME. I want to solve the following information retrieval problem:
Basicly I need to extract certain (production) methods out of textual data in the field of engineering.
My first approach was to build a custom tagger node, which is able to tag production methods. However I'm not able to code java, I mostly understood the associated tutorial and how to integrate custom tag set. Now I don't know, how to build a recognizer model for tagging the terms I want to extract.
Can anybody suggest tutorials on how to solve this task? Maybe I could modify an example to my needs? Or are there other possibilities on how to solve this?
I still very much appreciate every help concerning the extraction mentioned above.
As I'm still not sure, how to start with the tagging modell, I thought about an other approche to solve the task:
After standard preprocessing I used the sentence extractor node + row filter node, to filter the sentences using e.g. the regular expression ^.*method.*. This returns the strings containing the substring method. This process greatly reduces the amount of text I need to read, but is just a very rough way and excludes a great amount of interesting material. To refine the filter, I simply applied different row filter cosecutively to curtail the topic. This provides some useful material, but is still not a 'state of the art' way and still disregards many interesting text parts.
Can somebody think of improvements to my workflow or a different idea to solve the problem?
additional custom models can, so far, not be trained within KNIME. There are no dedicated nodes for this. You need to train the models using openNLP or Stanford NLP libs integrate this model as custom tagger node in KNIME.