The best way to group groups #urgent

Just to double check. So you want the groups P2, P3, P4, P6 removed, because they are already contained in the groups P1, P5?

3 Likes

P2, P3, P4, P6… are products not platform! I need to show just a list of aggregated collections. Better would be :slight_smile:
Platform--------- Product
G1;G2------------P1;P2;P3
G3;G4------------P5;P6;P4

Hi @Enigma2018,

sorry for the late answer. I attached a workflow with a possible solution. Could you tell me if that solves your issue?

Cheers
David

grouping.knwf (28.3 KB)

4 Likes

Thanks DaveK for replying and trying to help me. You are in the right way!
I tried to use your workflow but I found in your node Itemset findet a min 1 of support which for this example works. But for my productive data it doesnt work because support is too low

Could you elaborate on that? Do you mean the configured minimum support is too low?

In this case, the Item Set Finder (Borgelt) node is just used to find the biggest super sets of the platform groupings. Hence, the minimum support of 1 should find all of them. Could you maybe attach a workflow showing your issue?

1 Like

I just saw that in my attached workflow, the Item Set Finder (Borgelt) node is configured to use a percentage as minimum support. However, as explained above, this should have been an absolute number. Sorry for that. Does this maybe solve your issue?

1 Like

Hi,

I don’t know how exactly does it solve your main issue, but here is my solution:

grouping.knwf (31.7 KB)

Best,
Armin

1 Like

thanks armingrudd, but I still have some errors it’s still not right.
I Uploaded my excel data to show you my problem

My problem is so simple but may be I explained it bad.

I need to group all the platforms which have the same products and list them in a forme of collection or concatenate them, I tried excel power query but still don’t have what I need.

Exactly like that:
Platform--------- Product

G1---------------- P1
G1---------------- P2
G2----------------P3
G2---------------- P1
G3---------------- P5
G3---------------- P6
G4---------------- P5
G4---------------- P4

Result:
Grouping 1: G1;G2
Grouping 2: G3;G4

Davek was in the right way but the problem was the support which is in a simple example higher than in my real data even when I use absolute numbers.

Book1.xlsx (739.9 KB)

Hi @Enigma2018,

I think the problem were duplicate platforms after the first Group By node. Hence, I added another one to also group by the platforms to also aggregate all products. The rest stayed the same (see attached workflow). Is that what you are looking for?

grouping.knwf (727.3 KB)

2 Likes

@DaveK this time it looks really great thanks!
But how can I delete the doubled values in Platform column?
Like:
[X64, C]
[MS, C]
[X65, C]

into

X64, C, MS, M65

I’m not sure about this, but does the solution provided by @DaveK truly solve your issue?
Because still I can find sets with duplicate platforms (e.g. row 102 and 63 or those you mentioned).
I guess you want to merge them, right?

1 Like

I past an example of it from the result.
Next example is that:
[PQ23, PQ33]
[PQ25, PQ24]
[PQ33, PL/PQ46-47, PQ34, PQ35]

To be honest this problem went around in my mind for quite some time because I could not figure out if it was super simple or super complex and I did not get my head around it.

I now found a mixed solution relying heavily on a small (well) Python script merging lists and it becomes a mixture of Python and some KNIME nodes. I just wonder if the whole thing would be easier in just Python but I do not have the stamina right now to figure that out…

So without further ado here is the solution. It is possible there is a step which is too complicated. Hope this can be adapted. The results are lists. I kept all original lines on purpose so one can check if it makes sense. In a final step the duplicates of lists could be removed.

The original case with G1 … P1…
kn_example_group_of_groups.knwf (99.5 KB)

The extended case with the more complicated combinations:
kn_example_group_of_groups2.knwf (962.5 KB)

5 Likes

Nice, finally the problem is solved.
Do you think it’s possible to solve this issue without coding @mlauber71?

I though about a loop to match the items, tried a few things but was not able to finish it, would be interesting to see that in ‘pure’ KNIME. Would be about to add the items, split them, unite them again unique and do this for every line. Maybe I will give it another try to substitute the Python part.

2 Likes

Actually I did something like that (split and merge in sorted lists) in several layers but had the problem when one item which is at the beginning of a sorted list, is in the middle of another sorted list, so couldn’t complete the method.
I will try to do that in KNIME without coding as well. Let’s see if we can complete the task. (Before the topic is closed) :wink::blush:

4 Likes

Thanks for your help @mlauber71 and @armingrudd :+1:

4 Likes

Hi,
Here I have a new idea in which you don’t need to code and I think there cannot be any duplicate values in the output:

There is a recursive loop in which the platforms with a shared product are grouped together.
At first I thought I could do it without the recursive loop but there are some situations that make us use the recursive loop:
For example if the platforms were like this:
A,B
B,C
(2 layers) then everything were straightforward and we didn’t need a recursive loop. But if we have this:
A,B
B,C
C,D
(3 layers) then we need to do the process one more time (and one more loop for each additional layer).
I used a method so that you don’t need to set the number of loops manually but it does one additional loop (and finds out that the input and output row count are the same and then ends the loop). In your case it needs 4 recursive loops (the workflow does 5 loops) and the final row count is 156.
After the “Recursive Loop End” I added a few more nodes to add the products as well. (If you want the final lists to be sorted you can use a few more nodes to split and merge the items sorted)

Please check the workflow and let me know if it’s working as expected:
grouping.knwf (421.2 KB)

Best,
Armin

PS:

I think the solution by @mlauber71 is working perfectly fine (if you use a groupby node to group the two lists in his final output, you will get the same output as mine). I just wanted to provide a solution without coding.

3 Likes

Thanks a lot @mlauber71 and @armingrudd for your brilliant Ideas !!!

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.