Analyse a column of text rows in Excel for frequency of phrases (1-5 words) repeating between rows


I'm new to text processing and KNIME. My objective is to get frequency of all 1-5 worded phrases of a row, that are repeated between rows of excel (my input file).

I have used Excel reader -> string to document-> BoW creator -> TF/IDF counter.

However, the above workflow will give me only one worded phrase/term frequency. What kind of nodes or workflows can help me form a set of terms containing all 1-5 worded phrases in a row?

Any help would be highly appreciated. Thanks.



you can use the N Gram creator node after the Strings to Document node and extract n grams as bow or with frequency table. In your case the frequency table makes sense. You can use the N gram creator in a loop to extract 2 - 5 n grams, controlling the n variable with a flow var.

Cheers, Kilian