Help with converting values to columns

I have a dataset as follows:

cookie ID WebpageType  other vairable
12345 news, news, sports, fashion  
12345 news, blog, sports  
55555 fashion, food, women,  
88888 food, food, food, livestyle  

What i want is something alog the line of this:


cookie ID news sports fashion blog food women livestyle    
12345 3 2 1 1 0 0 0    
55555 0 0 1 0 1 1 0    
88888 0 0 0 0 3 0 1    

I have tried the GroupBy node, goruping the cookie Id´s and aggregating the column of WebpageType with unique concatenate with count, and its the closest i have come, but i cant use that for the clustering i want to do afterwards. Can anyone help?


I think you should be able to do this using the following sequence:

  • Cell Splitter to split your WebpageType column on each comma
  • Unpivoting (Value columns: enforce exclusion of cookie ID and WebpageType;  Retained columns: enforce inclusion of cookie ID)
  • GroupBy (Group columns ColumnValues and cookie ID; manual aggregation Count of RowIDs)
  • Pivoting (Groups: cookie ID; Pivots: columnValues; Manual Aggregation: Maximum of Count(RowIDs); pass Pivot Table outport to
  • Missing Value (replace missing Number (Integer) with fixed value 0)
  • and finally Column Rename (Regex) with search string ([a-z]+)\+.* and replace string $1 to fix the column names.