Extract surrounding words close to a tag

Hi!

I have tagged my dataset based on a pre-defined dictionary. Now I would like to extract all surrounding words (+/- 5) around each tag. I have tried to do so with the Term Neighborhood Extractor, yet this node does this for every word and not only for the ones that are tagged. Technically I could use a row filter after the Term Neighborhood node, but this will require a lot of computational power since my data set is very large.

So here is an example:

“I’ve been a customer[Tag] at this shop[Tag] for about three years now. Even though I currently live in Los Angeles[Tag], I find myself visiting the shop[Tag] at least once a month.”

For each word that has a [Tag], I’d like to extract the sorrounding words (+/- 5), plus the tagged word itself.

Hope someone can help!

Hello @annikawagner,

I do not think it is possible to get the neighbours of selected words using the term neighbourhood extractor. Since you’re dealing with a large dataset here are a couple of approaches that might help:

  1. Parallel Processing: To handle large datasets more efficiently, consider splitting your text into chunks and running the extraction process in parallel. This can help speed up processing by leveraging multiple CPU cores. You can find more information on how to set up parallel processing in KNIME here.

  2. Regex Extraction: For extracting specific contexts around your tagged words, you can use regular expressions. You can get details on how to use regex in KNIME with an example here.

I hope this helps.

Best,
Keerthan

2 Likes

Hi Keerthan,

thank you very much for your response!

I will have a closer look at both your suggestions and try them out.

Best,
Annika