I want to extract a search term together with its preceding (2) words, e.g. mouse as a search term shoul extract "mini mickey mouse" if found in a Pubmed Abstract. A typicl corpus onsist of 2000 to 20000 Pubmed abstract.
in Textprocessing 2.8 which comes with KNIME 2.8 there will be a Wildcard Tagger, providing functionality of wildcard or regex tagging. This node allows to find such word constellations easily.
Meanwhile you need to use a kind of a workaround to find such terms. Starting with your data table containing the documents (with the word mouse), first you need to extract the sentences of the documents as strings, using the Sentence Extractor node. Next use the node "Regex Split", which is searching and extracting substrings that match on a specified regex. Use ".*(\s+[a-z]+\s+[a-z]+\s+mouse).*" as pattern in the node dialog. The output table contains an additional column containing the substrings matching on this regex. Attached you find an example workflow.