How to apply different column combinations (>1 column) in GROUP BY node by flow variables

Dear all,

Via this forum I found out how I can make use of a Flow Variable in combination with the GROUP BY node. The example workflow that I found allows me to ‘group by’ my dataset on only one ‘variable’ column.

I would like to be able to ‘group by’ on a combination of more than one column that are being set by Flow Variables.

Example of column combinations that I would like to group on:

By using a ‘Loop’ I would like the group by node to perform 6 times a grouping based on the column combinations above.

Can anyone please help me out?

Many thanks in advance!

Kr,
Jurjen

Groupby with flow var.knwf (13.2 KB)

Hi there @jurjengroendijk,

welcome to KNIME Community!

If you have a fixed number of grouping column you can simply extend your Table Creator node prior to loop start with new columns containing grouping column names. See modified workflow:
Groupby with flow var.knwf 1.knwf (14.0 KB)

Br,
Ivan

Hello @ipazin,

Thanks a lot, that looks promising :slight_smile:

One additional and hopefully last question.
In the setting of ‘your’ flow is see the following:
image

In your flow variables “0” and “1” appear in the folder grouByColumns->IncList.
Somehow I don’t manage to get that available in my workflow. What I get is this:

image

I included my workflow. Please find attached.
Many tanks for getting me going!

Kr,
JurjenGroupby with flow var EDIT.knwf (25.3 KB)

Hi @jurjengroendijk,

it depends on the preselection in the node settings. If you’ve select 1 group column you’ll get 1 field in the Flow variables, if you select 2 group columns you’ll get 2 fields, etc.

I hope in future this settings will become more dynamically.

BR
Hermann

Dear @morpheus / @ipazin ,

Thanks you for your responses. Also after doing what you said, I stll don’t get the multiple selection possibilities in the Flow Variables. Please see below.

Could you please advice? There must be something that I am not doing correctly…

Many thanks!

Kr,
Jurjen

Hi Jurjen,

the picture shows how does it works with the Knime version i’m using. Not sure if it depends on the the version? I’m using version 4.0.1

Br,
Hermann

Thanks @morpheus, my version is 4.1.0.

The strange thing is that when I load the example workflow from @ipazin this feature is working. But when I use it for my own purposes/on my own dataset, the functionality gets lost. Also when I build it from scratch in exactly the same way. Very strange.

Maybe some else knows?

Many thanks to all for your assistance so far.

Kr,
Jurjen

Good evening @Iris ,

I received an example workflow from @ipazin in which he uses a GROUP BY node in combination with Flow Variables. In the tab ‘Flow Variables’ the number of possible columns to select (the variables) moves along with the total columns you drag into the “Groups” tab within the GROUP BY node. This would be normal.
However, when I do the same in my Knime workflow, the number of columns to select as a variabel under “IncList” does not change. So I cant use multiple columns as variable.

I am using version 4.1.0. Do you perhaps have any idea what is going wrong here?

Many thanks for reading and for your reaction in advance!

Kr,
Jurjen

Hi @jurjengroendijk,

regarding Flow Variables tab. The setting you have corresponds to KNIME version 4.1 and higher. Workflow I sent you is one I downloaded from your original post and modified. You said you found that example so my guess would be that this workflow was build prior to KNIME 4.1. Why 4.1? Because with this version new flow variables types were introduced including collection type that is now available in GroupBy node both for included and exclude columns. This actually makes easier to dynamically control GroupBy node. Unfortunately this is not the case when using loop as Table Row to Variable Loop Start doesn’t yet support new flow variables types.

Now to workaround which isn’t too bad. Using Chunk Loop Start you take row by row and create flow variable of Collection type inside loop using Table Row to Variable node.

GroupByLoop

Check workflow here:

If any questions/comments feel free to ask.

Br,
Ivan

Thank you very much, this looks promising again :slight_smile:
I am on version 4.1.0 (I didn’t understand what you meant with the different versions)

I will go ahead and try this. I’ll keep you informed :slight_smile:

2 Likes

Hi @ipazin
Thank you so much for this workflow - I think it is exactly what I need. I’m having trouble executing it in my example below though.

The error message is:

ERROR Loop End 0:54 Execute failed: Input table's structure differs from reference (first iteration) table: Column 1 [JA.COMPONENT (String)] vs. [MA.COMPONENT (String)]Column 2 [GA.ART.DESCRIPTION (String)] vs. [JA.ART.DESCRIPTION (String)]Column 3 [GA.GAM.ALT (String)] vs. [JA.GAM.ALT (String)]Column 4 [GA.WORK.CENTER (String)] vs. [JA.WORK.CENTER (String)]

I notice that in your example the columns all have different names so I can’t work out why it’s throwing this error for me.

image

Version:
KNIME Analytics Platform v4.1.2.v202003050920

Any pointers you might have - much appreciated.

Zx

Stepping through I see that my table created thus

image

Has a missing value when I look at the columns spec:

I guess this could create a problem in the loop…

I have tried deleting the cell contents and deleting the entire row, but get the same situation.

Stepping through the loop, I see that the GroupBy works fine - it has the correct data in the ZF.SITE column in both iterations.

Zx

Hi there @zedleb,

in each iteration after GroupBy node you’ll end up with different table cause you have different grouping columns so you need to check Allow table changing specifications option in Loop End node.

LoopEndTableSpec

Br,
Ivan

1 Like

Hi @ipazin
Thanks for your response! That worked - I have both iterations in the loop end now.

I learned something there :smiley:

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.