GroupBy node Feature Request


The GroupBy node is such a powerful node, but I would like some new additions if possible.

When looking at the aggregation methods for Time/Date cells, is it possible to have an option to report back either the earliest date or latest date.

Also when collecting data together with a count such as "Unique Concatanate with Count", is there any way to make the resultant cell contents from this to be easy to split into new columns, so if the output cell was "Apples (2) Oranges (8), Bananas (1)",  is it possible for an easy way to convert this into new columns labelled Apples, Oranges, Bananas as Integer columns with the corresponding numbers underneath (2,8,1).



You can get the earliest/latest date using Minimum/Maximum of the cell.

Hi Simon,

as Iris already mentioned returns the groupby node the earliest and latest date when you use the minimum/maximum operator that uses the data type specific comparator to retrieve the minimum/maximum value per group. For the Time/Date cells this is the earliest resp. latest point in time per group.

The second problem can be solved by using the Cell Splitter, Unpivoting, Regex Split, String to Number and Pivot node.

  1. First use the Cell Splitter to create for each unique value a column. Use the ',' as delimiter on the result column of the groupby node e.g. Unique concatenate with count...
  2. Use the Unpivoting node to move the columns to a row by selecting the resulting columns from the Cell Splitter node as Value columns
  3. Use the Regex Split node to create a column for the unique values and their count by apllying the following regex (.*)\(([0-9]*)\) on the ColumnValues column
  4. Use the String To Number node to convert the split1 column to int
  5. To pivot the table again use the Pivoting table with the split_0 column as Pivot column and the split_1 column as Group column

I will think about a better solution and might add a new aggregation method which makes this procedure much easier in one of the next KNIME releases.

Thanks a lot for the feature request. I'm always happy for new and interesting problems/ideas that improve the existing nodes.



First Unpivot your data, then apply the transformations, then Pivot back, no loop needed

Quick comment before I ask a question: Step #5 listed by Tobias seems wrong. I had to use all the columns not being split into new columns for the Group columns and NOT the split_1 column. The split_1 column is used as the aggregation column.

Now my question: Has any new capability been added to KNIME to simplify converting a column created by the "unique concatenate with count" aggregation function into multiple columns each named for one of the unique strings with the value being the count for that string? I am processing a data file where I have 15 columns aggregated using the "unique concatenate with count" in a GroupBy, and while the steps listed by Tobias work I must repeat them for every single column. This is a tremendous amount of repetitive work (even if a loop can help) that I would love to see built into a selectable option within the Pivot node, or maybe a new "Column Splitter" node. Perhaps some other solution has been created since 2011?

First step is to use a Cell Splitter node, which only splits one column at a time. So I could just line up 15 of these splitters, but a loop is cleaner and more dynamic to change. Actual pipeline is Cell Splitter - Unpivoting - Regex Split - Pivoting, so Unpivot is being used. Cell Splitter is only step that actually needs to be in a loop (as long as wildcards/regex patterns are used to specify columns in other nodes), but on large data sets it makes sense to only Unpivot one column at a time so the other steps can also be in the loop.

Actually, my suggestion consists in using Unpivot as a first step, prior to any following Unpivot nodes used in the subsequent workflow; this first Unpivot will transform the 15 columns to be split into 15 different varname-value pairs. Then proceed with your workflow: a single Cell Splitter - Unpivot - Regex Split - Pivot. After all that, use Pivot, again (!). The final workflow may be slightly more complex due to the fact that e.g. pivoting in KNIME automatically adds suffixes to columns, so you'd have to use Rename (regex) to remove the suffix.

Loop Start-Apply-Loop End is a fine strategy, too. I guess loops are more natural to people than pivoting/unpivoting. However, in KNIME, the action of parametrising loops often requires some preparation (flow variables configuration, column splitter, etc.), which is why I tend to prefer the Unpivot-Apply-Pivot approach.