Sentence Relevance Analysis based on Terms DF IDF and TF

Hello Knime community. I have a collection of short expressions containing the combination of 3 chars codes such as “AQP AQY A77 H0U SCS 4AA 2ST” and “LHD 3BI 3BJ”. There is a total of 383 different distinct individual 3 chars codes and the collection have 4000 combinations. I need to check the codes that have more relevance and suggest “average” expressions containing those that are more important to the collection. I extracted the codes DF and calculated the relevance of each one for the document. I choose to tag as “low relevance” those that are bellow a threshold. Now I would like to find where the low relevance ones are in the document and suggest new expressions that eliminate its usage. Are there any suggestions?
Here is some dummy data:
My goal is to process the text and have an average combination prioritizing the high DF ones.

just thinking if you get your data into a DB you could have 2 tables one with the doc one with the terms and then do a join on wildcard "ON a.Term = ‘%’ +b.Document + ‘%’

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.