Find occurrences grouped by a specific value

Hi collegues,

I'm trying to understand how to perform this kind of operation using a simple example related to product and brand:

rowID user product brand
1 John apple brand1
2 John banana brand1
3 John apple brand1
4 Paul cocco brand2
5 Paul cocco brand2
6 Paul cocco brand2

How can i get the occurrences of the product like this?

Here an example output:

user product product_occurrency brand brand_occurrency
John apple 2 brand1 3
John banana 1 brand1 3
Paul cocco 3 brand2 3

Mine it's just a simple example, probably there are alternative output smarter and more efficient than this one.

Suggestions are pretty welcome.

Thanks in advice.

-Giulio

Hi Giulio,

I wasn't shure whether you mean the overall product occurrence or the product occurence per customer. I just added one line to your example source-table and want to point out the difference between the two approaches:

John apple brand1
John banana brand1
John apple brand1
Paul cocco brand2
Paul cocco brand2
Paul cocco brand2
Paul apple brand1

If you want to know the occurrences per customers the final table would look like this:

John brand1 3 apple 2
John brand1 3 banana 1
John brand1 3 apple 2
Paul brand1 1 apple 1
Paul brand2 3 cocco 3
Paul brand2 3 cocco 3
Paul brand2 3 cocco 3

If you want to have the overall occurrence for product and respectively brand in every row, your final table should look like this:

John brand1 4 apple 3
John brand1 4 banana 1
John brand1 4 apple 3
Paul brand1 4 apple 3
Paul brand2 3 cocco 3
Paul brand2 3 cocco 3
Paul brand2 3 cocco 3

 

I attached the KNIME-workflow so you can reproduce these results. I did it basically with the groupBy and the joiner node. I hope that helps.

Cheers,

Marten

Hi Marten, thank you for your example workflow,

they were quite helpful. But with my example explanation I was trying to figure out how to get as brand_occurrences the result of the sum of each product occurrences per user.

For example:

user product product_occurrency brand brand_occurrency
John apple 2 brand1 3
John banana 1 brand1 3

So in this particular case we have that the occurrences of brand1 are the result of the sum of the user's product occurrences.

And in your first top example it was partially correct, but as product occurrences you have set the occurrences of the product not per user. For example for the user John we have: Apple: 2, Banana: 1, Apple: 2. But it should be someting like: Apple: 2, Banana: 1

-Giulio

Hi Giulio,

just add another groupBy-node and group over all the columns without aggregation. That should give you the results you want to have.

Cheers

Marten