Nested Loop? How to do calculation on other records and put results in interested cells

dperria · December 10, 2021, 5:08pm

Hello,

I am struggling trying to do this probably really simple task with knime.

I’ve got a datasource done in this way:

MAIN_CATEGORY,SUB_CATEGORY,NAME,SCORE
HIGH,HA,A1,15
HIGH,HA,A2,10
HIGH,HB,B1,20
HIGH,HB,B2,40
HIGH,HB,B3,22
LOW,LE,E1,14
LOW,LE,E2,12
LOW,LE,E3,16
LOW,LF,F1,50

and I need to obtain the result here below

MAIN_CATEGORY,SUB_CATEGORY,NAME,SCORE,OTHER_MEAN
HIGH,HA,A1,15,27.33
HIGH,HA,A2,10,27.33
HIGH,HB,B1,20,15
HIGH,HB,B2,40,15
HIGH,HB,B3,22,15
LOW,LE,E1,14,50
LOW,LE,E2,12,50
LOW,LE,E3,16,50
LOW,LF,F1,50,14

where in the last column there is the mean of scores of the records belonging the same MAIN_CATEGORY but different SUB_CATEGORY than the one considered.

Eg. the value 27.33 in these two records (same main category HIGH and same sub category HA)

HIGH,HA,A1,15,27.33
HIGH,HA,A2,10,27.33

is the value of the mean of the scores (20, 40, 22) of these 3 records (belonging to the same main category HIGH, but different sub category HB)

HIGH,HB,B1,20
HIGH,HB,B2,40
HIGH,HB,B3,22

How can I obtain this result?

gonhaddock · December 10, 2021, 8:44pm

Hi @dperria and welcome to the KNIME forum.

As I don’t fully understand the scope of use. This is an arrangement valid for a 1:2:n hierarchy relation, as requested in your example.

20211210_mean_swap.knwf (44.8 KB)

BR

bruno29a · December 10, 2021, 11:01pm

Hi @dperria , I agree with @gonhaddock , I also don’t fully understand the scope of use of this.

Also, where did you get the value 15 from? None of the mean of MAIN_CATEGORY and SUB_CATEGORY has value 15.

And as @gonhaddock mentioned, the swapping is doable based on the arrangement of your sample data, which is each MAIN_CATEGORY has 2 SUB_CATEGORY, and the swap of mean is happening between the 2 SUB_CATEGORY. If you have more than 2 SUB_CATEGORY for some of the MAIN_CATEGORY in your whole dataset, you will need to define how the swapping should be done.

dperria · December 11, 2021, 11:43am

Hi @gonhaddock thank you for your quick answer.

It’s a good workaround but i need to implement a generic solution where the hierarchy in one of the main categories is not just 1:2:n, but that can be 1:m:n.

BR

dperria · December 11, 2021, 12:09pm

Hi @bruno29a you are right regarding the 15 value, I have probably done a typo.
Here below is the amended example.
Where 12.5 for the HIGH,HB items is the result of the mean of the scores of the HIGH, HA, items (i.e. 12.5 = (15 +10)/2.

The scope of doing this is to match the result of items of the same MAIN_CATEGORY and SUB_CATEGORY, with the average result of the other items of the same MAIN_CATEGORY.
I need a generic solution that I can apply if the sub categories in the same main_category are more than 2.

MAIN_CATEGORY,SUB_CATEGORY,NAME,SCORE
HIGH,HA,A1,15
HIGH,HA,A2,10
HIGH,HB,B1,20
HIGH,HB,B2,40
HIGH,HB,B3,22
LOW,LE,E1,14
LOW,LE,E2,12
LOW,LE,E3,16
LOW,LF,F1,50

and I need to obtain the result here below

MAIN_CATEGORY,SUB_CATEGORY,NAME,SCORE,OTHER_MEAN
HIGH,HA,A1,15,27.33
HIGH,HA,A2,10,27.33
HIGH,HB,B1,20,**12.5**
HIGH,HB,B2,40,**12.5**
HIGH,HB,B3,22,**12.5**
LOW,LE,E1,14,50
LOW,LE,E2,12,50
LOW,LE,E3,16,50
LOW,LF,F1,50,14

gonhaddock · December 11, 2021, 12:37pm

Hi @dperria
I was expecting something like that… but you are not providing enough information.

Then now, you have 4 possible approaches to go, in order to ideate the comparing relations; each of them having implications in the workflow; and also in the expected results.

The simple solution, would be to calculate the mean over all OTHER NAME within the MAIN_CATEGORY
Calculate ‘weighted means’ over OTHER SUB_CATEGORY items by the sum of the SCORE within the SUB_CATHEGORY
(1) Another possible approach is to calculate the simple Mean of the all OTHER SUB_CATHEGORY mans, let’s call it ‘mean of means’. This approach may be interesting if the type of data is sampled in a way with inherent bias, displaying a better description of the sampling.
(2) Then you could approach the mean with the central limit theorem normalizing these means. Let’s call it ‘central limit theorem approach’
This fourth approach (less probable) is to display a simplified matrix with all the rest of the OTHER SUB_CATHEGORY means.

This fourth approach can be combined with extra columns with options 1, 2 and 3 together; with PROS: that can give you more visual QC, and CONS: that if your ‘m’ is huge, then the matrix becomes proportionally unpractical.

As you can see these approaches may be related to the nature of your data. But we are getting a closer idea on your expectations

BR

bruno29a · December 11, 2021, 4:01pm

Hi @dperria , thank you for confirming on the correct values, and thank you for confirming that there can be more than 2 sub categories in a main category.

But as per @gonhaddock , you are not providing any explanation on how the results should be if you have more than 2 sub categories. When you have only 2 sub categories, the “OTHER” mean would be the other mean. If you have 3 sub categories, there would be 2 other “OTHER” means. How do you choose which of the 2 is the “OTHER” mean per sub category?

gonhaddock · December 11, 2021, 4:21pm

@dperria
I’ve anticipated a possible solution with the ‘simple’ option 1 method. This is the average of all OTHER NAME SCORES within the same MAIN_CATEGORY.

I added few SUB_CATHEGORIES to input data, aiming to test the hierarchy 1:m:n

This would be the base for any other swapping estimates. But let me know any comment or finding.

BR

dperria · December 13, 2021, 5:15pm

Thank you @gonhaddock this is exactly what I was looking for!

BR

dperria · December 13, 2021, 5:17pm

Hi @bruno29a thank you for your help. @gonhaddock provided a solution here below.

system · December 20, 2021, 5:17pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.