Hello.
I would like to process a table that combines numeric and text attributes. FOr the text attributes, I would like to a) parse them into 1, 2, and 3-word ngrams, b) create a new numeric column for each ngram (header=ngram), and c) populate the columns with a 1 in each row whose text contains the ngram.
If you are familiar with Weka, this is their “String To Word Vector” filter. Unfortunately, although Knime includes many Weka data mining algorithms, it does not provide access to Weka filters directly.
To illustrate:
Input table
Row X1 X2 X3 Y
1 2.1 1.2 “apple sauce” 5.1
2 2.2 4.5 “apple juice” 3.6
Output table
Row X1 X2 Y apple sauce juice
1 2.1 1.2 5.1 1 1 0
2 2.2 4.5 3.6 1 0 1
I would appreciate any clues you could provide me.
Bill