Feature request: groupby fail switch

Dear knimers,

it is sometimes annoying that the GroupBy node requires you to set the maximum group size, but I am sure that there are good reasons why it has to be like this. This means that it can happen that one of the groups exceeds the group size. The aggregated value is then replaced by a missing value. However, the rest of the workflow still executes and may produce nonsense results because of the failed aggregation. It can be hard to find the spot where it went wrong.

My suggestion is to add an option to the GroupBy node “Fail if group too large”. Crash early, crash hard is often the best option :blush:



Much agreed! would love to have this :slight_smile:


Hi there @Aswin ,

ticket created!



Awesome, thanks @Iris and @ipazin ! :smiley: It would be even more awesome if this option could also be included in the big borther of the GroupBy node, the Pivot node :sweat_smile:


On a related note, would you recommend calculating the maximum group size prior to using the GroupBy node, for example with a Value Counter node on the groups? This value could then be stored in a flow variable and then entered in the max group size field. Would this result in a smaller memory footprint in when the GroupBy node creates a bunch of Collections? :thinking: It would also avoid the “group too large” problem… wouldn’t it? Example below.


KNIME_project8.knwf (11.7 KB)

1 Like

Hi @Aswin,

hmm… So question is does maximum group size affects execution time and memory? Have no clue :smiley:

Sure, it would avoid “group too large” problem but you would pay that with 3 additional nodes for each GroupBy node if you want to be 100% sure (or one metanode). But one of those nodes is Sorter so I prefer your idea with crashing… But, this can be useful solution for someone who doesn’t wan’t to get missing values or (in future hopefully) node to fail :slight_smile:


1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.