DUMMIES/BOOLEAN VARIABLES FROM COLLECTION/LIST COLUMN

I try to create dummies variable from a column containing list of values (String with comma seaprator or Collection)

A

B

A,B

null

A,B,C

A,D

A,B,D

I've 200 possible labels

I try to get boolean columns X_A, X_B, X_C X_D, etc .. containing 1 if in the row, the cell containing the specific term (i.e A, B, C, D etc...)

I cannot obtain what i want with the One To Many node, do you have any idea ?

 

 

Hi Fabrice JOURDAN,

 

In order to be able to create dummy variables with a One To Many node you have to have one value per data cell, since the node doesn't work with lists or collections. 

If you have a KNIME collection (List (Collection of: String) type) you can either use a Split Collection Column or a Ungroup node to split the collections. If you have Strings with commas, you can use a Cell Splitter node with comma (",") delimiter to split the values. 

 

Best,

Anna

Cell Splitter does not do "One Hot Encoding" which is necessary. Instead it respects the ordinal order of the tokens encoding "A,B" differently than "B,A". What is the solution?

Cell Splitter node with comma (",")  and Ungroup seems Ok