The best way to group groups #urgent

Enigma2018 · January 31, 2019, 10:35am

Hello everybody,

I’m trying to group or aggregate (in collection) a big list of product platforms based their products.
e. g.
Platform--------- Product

G1---------------- P1
G1---------------- P2
G2----------------P3
G2---------------- P1
G3---------------- P5
G3---------------- P6
G4---------------- P5
G4---------------- P4

Result:
Grouping 1: G1;G2
Grouping 2: G3;G4

I did a group by but the result ist not 100% because some Grouping have sometimes the same Platforms just with other combinations.
But I can solve this problem if I can solve this example:
List input:
X;Y
Y;Z
W;P
P;O

Output
X;Y;Z
W;P;O

DaveK · January 31, 2019, 1:00pm

HI @Enigma2018,

Could you maybe give an example showing that problem? E.g. how the output from the GroupBy node looks like and how you would like it to look like?

Cheers
David

Enigma2018 · January 31, 2019, 1:06pm

Hello Davek,

thanks for replying but i have written the result of my group by. It’s a group by for a many to many relationship (between Platform and product tables ) and it will have as result a collection of all platforms which have the same products:
e. g.
Platform--------- Product

G1---------------- P1
G1---------------- P2
G2----------------P3
G2---------------- P1
G3---------------- P5
G3---------------- P6
G4---------------- P5
G4---------------- P4

Result:
Grouping 1: G1;G2
Grouping 2: G3;G4

Enigma2018 · January 31, 2019, 1:17pm

tell me if it isnt so clear for you

DaveK · January 31, 2019, 1:38pm

Hi @Enigma2018,

I think I got it now. So the output from the GroupBy node in your example would be:

P1–G1,G2
P2–G1
P3–G2
P4–G4
P5–G3,G4
P6–G3

but you don’t want to have subsets that are contained in a larger superset (so just groups P1 and P5)?

Enigma2018 · January 31, 2019, 1:39pm

Actually I have this result that you have written. But I need a list of concatenated or list of values
Result:

G1;G2
G3;G4

DaveK · January 31, 2019, 1:54pm

Just to double check. So you want the groups P2, P3, P4, P6 removed, because they are already contained in the groups P1, P5?

Enigma2018 · January 31, 2019, 1:59pm

P2, P3, P4, P6… are products not platform! I need to show just a list of aggregated collections. Better would be
Platform--------- Product
G1;G2------------P1;P2;P3
G3;G4------------P5;P6;P4

DaveK · February 5, 2019, 10:42am

Hi @Enigma2018,

sorry for the late answer. I attached a workflow with a possible solution. Could you tell me if that solves your issue?

Cheers
David

grouping.knwf (28.3 KB)

Enigma2018 · February 5, 2019, 3:18pm

Thanks DaveK for replying and trying to help me. You are in the right way!
I tried to use your workflow but I found in your node Itemset findet a min 1 of support which for this example works. But for my productive data it doesnt work because support is too low

DaveK · February 5, 2019, 3:46pm

Could you elaborate on that? Do you mean the configured minimum support is too low?

In this case, the Item Set Finder (Borgelt) node is just used to find the biggest super sets of the platform groupings. Hence, the minimum support of 1 should find all of them. Could you maybe attach a workflow showing your issue?

DaveK · February 5, 2019, 3:52pm

I just saw that in my attached workflow, the Item Set Finder (Borgelt) node is configured to use a percentage as minimum support. However, as explained above, this should have been an absolute number. Sorry for that. Does this maybe solve your issue?

armingrudd · February 5, 2019, 5:04pm

Hi,

I don’t know how exactly does it solve your main issue, but here is my solution:

grouping.knwf (31.7 KB)

Best,
Armin

Enigma2018 · February 6, 2019, 7:40am

thanks armingrudd, but I still have some errors it’s still not right.
I Uploaded my excel data to show you my problem

My problem is so simple but may be I explained it bad.

I need to group all the platforms which have the same products and list them in a forme of collection or concatenate them, I tried excel power query but still don’t have what I need.

Exactly like that:
Platform--------- Product

G1---------------- P1
G1---------------- P2
G2----------------P3
G2---------------- P1
G3---------------- P5
G3---------------- P6
G4---------------- P5
G4---------------- P4

Result:
Grouping 1: G1;G2
Grouping 2: G3;G4

Davek was in the right way but the problem was the support which is in a simple example higher than in my real data even when I use absolute numbers.

Book1.xlsx (739.9 KB)

DaveK · February 6, 2019, 9:20am

Hi @Enigma2018,

I think the problem were duplicate platforms after the first Group By node. Hence, I added another one to also group by the platforms to also aggregate all products. The rest stayed the same (see attached workflow). Is that what you are looking for?

grouping.knwf (727.3 KB)

Enigma2018 · February 6, 2019, 9:26am

@DaveK this time it looks really great thanks!
But how can I delete the doubled values in Platform column?
Like:
[X64, C]
[MS, C]
[X65, C]

into

X64, C, MS, M65

armingrudd · February 6, 2019, 11:23am

I’m not sure about this, but does the solution provided by @DaveK truly solve your issue?
Because still I can find sets with duplicate platforms (e.g. row 102 and 63 or those you mentioned).
I guess you want to merge them, right?

Enigma2018 · February 6, 2019, 12:11pm

I past an example of it from the result.
Next example is that:
[PQ23, PQ33]
[PQ25, PQ24]
[PQ33, PL/PQ46-47, PQ34, PQ35]

mlauber71 · February 6, 2019, 10:31pm

To be honest this problem went around in my mind for quite some time because I could not figure out if it was super simple or super complex and I did not get my head around it.

I now found a mixed solution relying heavily on a small (well) Python script merging lists and it becomes a mixture of Python and some KNIME nodes. I just wonder if the whole thing would be easier in just Python but I do not have the stamina right now to figure that out…

So without further ado here is the solution. It is possible there is a step which is too complicated. Hope this can be adapted. The results are lists. I kept all original lines on purpose so one can check if it makes sense. In a final step the duplicates of lists could be removed.

The original case with G1 … P1…
kn_example_group_of_groups.knwf (99.5 KB)

The extended case with the more complicated combinations:
kn_example_group_of_groups2.knwf (962.5 KB)

armingrudd · February 7, 2019, 5:21pm

Nice, finally the problem is solved.
Do you think it’s possible to solve this issue without coding @mlauber71?