I’m new in Knime and I can’t find the solution for one of my manipulation.
I have strings of word (example below):
billet maniel dorin
The objective is to find all possible combinations of words with a minimum of 2 words and a maximum of the entire string, keeping the order of the string.etc
Below all possible combinations with the string of 3 words "billet maniel dorin" :
Thank you for your great idea. I tried to put a Ngram Creator in order to create my words combination.
Nevertheless, it creates only combinations of words without skipping a word. Bellow all results from the Ngram Creator for the string of 3 words " Billet maniel dorin" :
So this means that ngram is not the appropriate solution, for they take in account only the adjacent words.
Here a more complex solution, which will work:
create two additional empty string columns with Constant Value;
Strings to Document, using the two empty columns for the authors and the fulltext, your actual feeding the title;
Bag of Words Creator;
Group Loop Start with Document as group;
Cross-joiner with top and bottom port having exactly the same source;
Rule-based Row Filter, excuding false rows
$Term$ = $Term (#1)$ => FALSE
TRUE => TRUE
Loop End
Now you have all combinations. Use Term To String on both term columns. Now the only challenge left to you is getting rid of the redundant combinations: e.g. billet maniel vs maniel billet. Probably something can be done with an unpivot-pivot strategy or with a GroupBy ...
Thank you so much Geo it works : I have all combinations by pair :) !
But what I need is to find all possible combinations of words with a minimum of 2 words and a maximum of the entire string, keeping the order of the words.
For example we have the following string : " billet maniel morin black"
The expected result is the following list :
billet maniel
billet morin
billet black
maniel morin
maniel black
morin black
billet maniel morin
billet maniel black
billet morin black
maniel morin black
billet maniel morin black
The order is retained and words are grouped by 2; 3 and 4 (4 is the length of the initial string). The same model must be reproduced for strings with a length of 3; 6 or 8 ...