Extracting words preceding a search term and words in the middle of a search term

Hello all,

I am working on a text mining project,  I have written my own dictionary containing of approximately 3000 words. I am interested, if the search term, for example would be: "it is nice today", is there a way to recognize term with one more word in the middle? For example "it is nice weather today". I understand it is possible to extract words preceding a search term, but what about words in the middle of the term?

Thank you in advance,

Ana

 

Hi Ana

 

There is already post about exstracting words preceding a search term, e.g. two words precending a term. Kilian posted an example workflow. This workflow is mainly based on the Sentence Extractor node and Regex Split node.

 

In this special example a regular expression like .*(\s+[a-z]+\s+[a-z]+\s+mouse).* was used. You can adapt this expression to your needs, e.g. .*([Ii]t is nice\s+[a-z]+\s+today).*

 

But I am not sure at the moment  if I understood your problem correctly and if this helps for your special problem. Is this solution an option for you? Or does that mean that you have to correct all your 3000 words?

 

Frank

 

Hello Frank,

yes, I did see your post and Killian's answer and it is very usefull for one term or phrase. However, I would like to apply the regular expression on bigger number of terms/phrases, maybe not on all 3000, but more than one. Do you think there is more 'universal' way to make regular expression work on multiple terms/phrases? Or do I have to write reg ex for every term/phrase separately? Thank you for your help,

Ana

Hi Ana,

 

i am afraid you have to write regular expressions for all entries in your dictionary. But if the pattern of regex creation is somehow repetitive you can automate it by e.g. using the Java Snippet node or the String Manipulation node.

 

With KNIME 2.8 there will be a Wildcard Dict. Tagger, supporting wildcards and regexes in dictionary entires. Nevertheless the entries and thereby the regex have to be specified.

 

Cheers, Kilian

Hello Killian,

 

thanks for the answer, Wildcard Dict. Tagger sounds great and simpler to do this task. Is 2.8 going to be released soon, when can we expect it? Kind regards,

 

Ana

Hi Ana,

 

KNIME and Textprocessing 2.8 is going to be released in July.

 

Cheers, Kilian