Say column X has comma delimiter values e.g. A, B, C or C, D, E (e.g. list of cities).
One would like to code these as a one hot encoding, treating each entry as a "document" and each value as a "word".
How can this be done in knime?
e.g. A | B| C| D| E
example 1: A, B, C -> 1,1,1,0,0
example 2: C, D, E -> 0,0,1,1,1
List of possible values ("words") should be learned from the data.
I have tried Text package but still have problems... I get "Index of specified original document column is not valid" when I try to pass my column through "String To Document" and then "Bag of Words Creator" and then "Document Vector"
Might be a little late, but you can use the “Category to Number” node
You might look for the One to Many node.
(PS: you can get those hub links with the search button on top, see here)