Get combinations of words (2+ words) in the text

Dear friends,
May be its not the most exiting question and certainly not the most complex task, but I’d like to ask for your advise.
Extraction of co-occurring terms is an easy task with Term Co-Occurence counter, but what if I’d like to extract 2+ word combinations (not just pairs but also triplets or even 4-5 words long combinations occurring in the document more then ones)? I see that there’s a solution for common text - in fact the solution I have is far not perfect, so even with this task I need help - but with terms it seems hardly even possible.
Can you please give me a hint and a roadmap :slight_smile:

Wish you all the best,

Hello @DmitryIvanov76
I’ve just published a solution in a previous post that might be suitable -as approach- for your use case. From a Logical Indexing matrix of words you can create Rule Engine or Regex True/False combined extractions.

Further than this, you could give us a more detailed description of the case, or even provide some sample data describing your inputs and expected output.


1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.