The best way to group groups #urgent

Hello everybody,

I’m trying to group or aggregate (in collection) a big list of product platforms based their products.
e. g.
Platform--------- Product

G1---------------- P1
G1---------------- P2
G2----------------P3
G2---------------- P1
G3---------------- P5
G3---------------- P6
G4---------------- P5
G4---------------- P4

Result:
Grouping 1: G1;G2
Grouping 2: G3;G4

I did a group by but the result ist not 100% because some Grouping have sometimes the same Platforms just with other combinations.
But I can solve this problem if I can solve this example:
List input:
X;Y
Y;Z
W;P
P;O

Output
X;Y;Z
W;P;O

HI @Enigma2018,

Could you maybe give an example showing that problem? E.g. how the output from the GroupBy node looks like and how you would like it to look like?

Cheers
David

2 Likes

Hello Davek,

thanks for replying but i have written the result of my group by. It’s a group by for a many to many relationship (between Platform and product tables ) and it will have as result a collection of all platforms which have the same products:
e. g.
Platform--------- Product

G1---------------- P1
G1---------------- P2
G2----------------P3
G2---------------- P1
G3---------------- P5
G3---------------- P6
G4---------------- P5
G4---------------- P4

Result:
Grouping 1: G1;G2
Grouping 2: G3;G4

tell me if it isnt so clear for you

Hi @Enigma2018,

I think I got it now. So the output from the GroupBy node in your example would be:

P1–G1,G2
P2–G1
P3–G2
P4–G4
P5–G3,G4
P6–G3

but you don’t want to have subsets that are contained in a larger superset (so just groups P1 and P5)?

2 Likes

Actually I have this result that you have written. But I need a list of concatenated or list of values
Result:

G1;G2
G3;G4

Just to double check. So you want the groups P2, P3, P4, P6 removed, because they are already contained in the groups P1, P5?

3 Likes

P2, P3, P4, P6… are products not platform! I need to show just a list of aggregated collections. Better would be :slight_smile:
Platform--------- Product
G1;G2------------P1;P2;P3
G3;G4------------P5;P6;P4

Hi @Enigma2018,

sorry for the late answer. I attached a workflow with a possible solution. Could you tell me if that solves your issue?

Cheers
David

grouping.knwf (28.3 KB)

4 Likes

Thanks DaveK for replying and trying to help me. You are in the right way!
I tried to use your workflow but I found in your node Itemset findet a min 1 of support which for this example works. But for my productive data it doesnt work because support is too low

Could you elaborate on that? Do you mean the configured minimum support is too low?

In this case, the Item Set Finder (Borgelt) node is just used to find the biggest super sets of the platform groupings. Hence, the minimum support of 1 should find all of them. Could you maybe attach a workflow showing your issue?

1 Like

I just saw that in my attached workflow, the Item Set Finder (Borgelt) node is configured to use a percentage as minimum support. However, as explained above, this should have been an absolute number. Sorry for that. Does this maybe solve your issue?

1 Like

Hi,

I don’t know how exactly does it solve your main issue, but here is my solution:

grouping.knwf (31.7 KB)

Best,
Armin

1 Like

thanks armingrudd, but I still have some errors it’s still not right.
I Uploaded my excel data to show you my problem

My problem is so simple but may be I explained it bad.

I need to group all the platforms which have the same products and list them in a forme of collection or concatenate them, I tried excel power query but still don’t have what I need.

Exactly like that:
Platform--------- Product

G1---------------- P1
G1---------------- P2
G2----------------P3
G2---------------- P1
G3---------------- P5
G3---------------- P6
G4---------------- P5
G4---------------- P4

Result:
Grouping 1: G1;G2
Grouping 2: G3;G4

Davek was in the right way but the problem was the support which is in a simple example higher than in my real data even when I use absolute numbers.

Book1.xlsx (739.9 KB)

Hi @Enigma2018,

I think the problem were duplicate platforms after the first Group By node. Hence, I added another one to also group by the platforms to also aggregate all products. The rest stayed the same (see attached workflow). Is that what you are looking for?

grouping.knwf (727.3 KB)

2 Likes

@DaveK this time it looks really great thanks!
But how can I delete the doubled values in Platform column?
Like:
[X64, C]
[MS, C]
[X65, C]

into

X64, C, MS, M65

I’m not sure about this, but does the solution provided by @DaveK truly solve your issue?
Because still I can find sets with duplicate platforms (e.g. row 102 and 63 or those you mentioned).
I guess you want to merge them, right?

1 Like

I past an example of it from the result.
Next example is that:
[PQ23, PQ33]
[PQ25, PQ24]
[PQ33, PL/PQ46-47, PQ34, PQ35]

To be honest this problem went around in my mind for quite some time because I could not figure out if it was super simple or super complex and I did not get my head around it.

I now found a mixed solution relying heavily on a small (well) Python script merging lists and it becomes a mixture of Python and some KNIME nodes. I just wonder if the whole thing would be easier in just Python but I do not have the stamina right now to figure that out…

So without further ado here is the solution. It is possible there is a step which is too complicated. Hope this can be adapted. The results are lists. I kept all original lines on purpose so one can check if it makes sense. In a final step the duplicates of lists could be removed.

The original case with G1 … P1…
kn_example_group_of_groups.knwf (99.5 KB)

The extended case with the more complicated combinations:
kn_example_group_of_groups2.knwf (962.5 KB)

5 Likes

Nice, finally the problem is solved.
Do you think it’s possible to solve this issue without coding @mlauber71?