May be its not the most exiting question and certainly not the most complex task, but I’d like to ask for your advise.
Extraction of co-occurring terms is an easy task with Term Co-Occurence counter, but what if I’d like to extract 2+ word combinations (not just pairs but also triplets or even 4-5 words long combinations occurring in the document more then ones)? I see that there’s a solution for common text - in fact the solution I have is far not perfect, so even with this task I need help - but with terms it seems hardly even possible.
Can you please give me a hint and a roadmap
I’ve just published a solution in a previous post that might be suitable -as approach- for your use case. From a Logical Indexing matrix of words you can create Rule Engine or Regex True/False combined extractions.
Further than this, you could give us a more detailed description of the case, or even provide some sample data describing your inputs and expected output.