Group by specified group: calculate means, quantile and deviation

Hey there, 

I have a question if the GroupBy node (or any other node) can do the following. 

If have a table like this:

RowID GroupNo Value
R1 1 4
R2 1 2
R3 1 3
R4 2 50
R5 3 15
R6 3 20

With the GroupBy node I can get the following output: 

GroupNo Sum Count Mean
1 9 3 3
2 50 1 50
3 35 2 17,5

So the first question I have: How is it possible to calculate the x-qunatile (e.g. 0,75-qunatile) of each GroupNo? Is this possible to do this by GroupBy node?

Additionally (and more extensive), I would like to have a comparison of the "Value" of the "RowID" to the belonging GroupID-Mean. So that I have a new column with the difference. Additionally I want to have a second column that says if there is a deviation for eg. to the third quantil of the belonging GroupID-Mean. Something like this: 

RowID GroupNo Value (belonging) GroupID-Mean deviation to (belonging)GroupID-Mean 0,75-quantile of belonging GroupID Deviation to 0,75-quantile pos. Deviation to 0,75-quantile
R1 1 4 3 1 3,5 0,5 yes
R2 1 2 3 1 3,5 -1,5 no
R3 1 3 3 0 3,5 -0,5 no
R4 2 50 50 0 50 0 no
R5 3 15 17,5 2,5 18,75 -3,75 no
R6 3 20 17,5 2,5 18,75 1,25 yes

Is this possible in KNIME?
 

Thanks so much for any hint and advice :) 

Jasmin

No you cannot calculate the quantiles with a group by, however it is on our feature list. as a workaround, you can use the GroupLoopStart, than a boxplot and some filtering and a loop end will get you these results as well.

 

The second question can be solved :)

You take the output of the groupBy (for the mean of each group) and join it on the GroupNo with the original data table. Than the Math Node gives you the difference between the mean and the true value.

Cheers, Iris

Hi Iris, 

thanks a lot! This totally works fine (after some time of trying different ways) ! Looking forward for the feature of the GroupBy Node, because with your solution I can only get the "standard-quartiles"... :)

However I have one question left: If I have not a GroupNo like 1, 2, 3 (=Integer) but like G1, G2, G3 (=String) How can I transform the GroupNo G1 into 1 and G2 into 2 etc? Or how can I add a new column with the GroupNo as Integer? 

Thanks a lot :) 

Jasmin

Hi Jasmin,

you can use the String Manipulation node with the replace (like: replaceChars($column1$,"G" ,"" ) ) and afterwards a String to Number node.

The Category to Number node could be used as well, but it will assign each group an integer, independent from its value.

Cheers, Iris

 

Hi Iris, 

this works fine. However I got a problem when there is no number combined with the number (only string). My groups are named for eg. "Student", "Professor", "Trainee", "Manager" and many more. Isn't there a pssibility to count the groups and use this number in the further process. 

I now use the "trick" in using the value counter (then I get all the groups and its names) and export this via "CSV Writer". In the CSSV I add the GroupNo and use this file (File Reader) and join this with the nodes befor, so I get the GroupNo... it works fine but thats not such a beautiful solution ;) 

Jasmin

Did you try the Category to number node? This should do the trick.

Oh yes! This totally works !

Thank you so much - this solution is far better :) 

Hello,

with the next KNIME release at the end of July the group by node will offer much more statistics e.g. second moment, quantile, correlation, covariance, sum of squares/logs, Kurtosis, Skewness, MAD (mean/median absolute deviation).

Bye,

Tobias

A post was split to a new topic: Rounding Mean in a GroupBy node