Hi everybody !
I’m new in Knime and I can’t find the solution for one of my manipulation.
I have strings of word (example below):
billet maniel dorin
The objective is to find all possible combinations of words with a minimum of 2 words and a maximum of the entire string, keeping the order of the string.etc
Below all possible combinations with the string of 3 words "billet maniel dorin" :
- billet maniel
- billet dorin
- maniel dorin
- billet maniel dorin
How to manage to do it on knime ?
Thanks in advance for your reply.
I guess that you have to solve in Java within a Java Snippet.
Thank you for your reply Spider.
Anyone has a suggestion concerning the code within the Java node ?
Thanks for your reply
How about ngram creator? You’ll have to use several and concatenate their results though.
Thank you for your great idea. I tried to put a Ngram Creator in order to create my words combination.
Nevertheless, it creates only combinations of words without skipping a word. Bellow all results from the Ngram Creator for the string of 3 words " Billet maniel dorin" :
The combination " billet dorin " is missing.
How to manage to do this ?
Thanks in advance for your reply !
So this means that ngram is not the appropriate solution, for they take in account only the adjacent words.
Here a more complex solution, which will work:
- create two additional empty string columns with Constant Value;
- Strings to Document, using the two empty columns for the authors and the fulltext, your actual feeding the title;
- Bag of Words Creator;
- Group Loop Start with Document as group;
- Cross-joiner with top and bottom port having exactly the same source;
- Rule-based Row Filter, excuding false rows
$Term$ = $Term (#1)$ => FALSE
TRUE => TRUE
Now you have all combinations. Use Term To String on both term columns. Now the only challenge left to you is getting rid of the redundant combinations: e.g. billet maniel vs maniel billet. Probably something can be done with an unpivot-pivot strategy or with a GroupBy ...
Thank you so much Geo it works : I have all combinations by pair :) !
But what I need is to find all possible combinations of words with a minimum of 2 words and a maximum of the entire string, keeping the order of the words.
For example we have the following string : " billet maniel morin black"
The expected result is the following list :
- billet maniel
- billet morin
- billet black
- maniel morin
- maniel black
- morin black
- billet maniel morin
- billet maniel black
- billet morin black
- maniel morin black
- billet maniel morin black
The order is retained and words are grouped by 2; 3 and 4 (4 is the length of the initial string). The same model must be reproduced for strings with a length of 3; 6 or 8 ...
Thanks in advance for your help