Group By controled by Flow Variable

Hello alltogether,

the default aggregation methode in the group by node for numbers is "Mean".

I am controlling the aggregation methode by a flow variable, which works quite well (see image1)

The problem is, that I don't know what size the array is. Automatically if there a 3 columns in the manual aggregation settings, it is 3 (like in image1). If there is another column added in the manual aggretation settings, it is 4 and so on.

Is it possible to set the whole array dynamically with the right value (summe_bilden), so that I don't have to do it manually (see image2).



Maybe wrapping the groupby into a loop is easier in this case.

Hello Geo

Can You explain a little bit more detailed, what You mean by: Wrappoing the groupby into a loop.


In my opinion the easiest way would be, to change the default aggregation method to sum and not to mean.

That would solve the problem.

I guess, this is not possible ?

I already figured out a solution by myself, a very easy one :-)

I prepared a group node with an a aggregationMethod with an array-size of almost 50.

(I never have more than 50 numbers in my table.)

When I add a groupby node, I do not choose on of the repository, instead I copy and paste my prepared "Group By" node. That works perfect for me.



another solution could be to use the type or parameter based aggregation. This way you can asign more then one columns an aggregation method e.g. all columns that start with col_ or simply all numeric coulmns. In the flow variable tab you then have to select your aggregation method only for the name/type based selector.



"To wrap the groupby into a loop" meant: when the array size cannot be determined in advance, use the "split, apply & combine" strategy:

- (split) column list loop start (choose your columns by enforcing exclusion but don't forget not to exclude your grouping variable such as e.g. "class");

- (apply) rename your variable of interest (using the currentColumnName flow variable) to e.g. myvar, then comes the groupby (define the aggregation manually for myvar), then again rename myvar back to the original name;

- (combine) loop end (column append), followed by a column filter (wildcard) to kick the excess grouping variables  (e.g. "class (iter #*").

There is of course still potential to make this workflow more elegant ;-)

Hadley Wickham wrote an interesting paper on the split, apply & combine approach: