I have a question if the GroupBy node (or any other node) can do the following.
If have a table like this:
RowID
GroupNo
Value
R1
1
4
R2
1
2
R3
1
3
R4
2
50
R5
3
15
R6
3
20
With the GroupBy node I can get the following output:
GroupNo
Sum
Count
Mean
1
9
3
3
2
50
1
50
3
35
2
17,5
So the first question I have: How is it possible to calculate the x-qunatile (e.g. 0,75-qunatile) of each GroupNo? Is this possible to do this by GroupBy node?
Additionally (and more extensive), I would like to have a comparison of the "Value" of the "RowID" to the belonging GroupID-Mean. So that I have a new column with the difference. Additionally I want to have a second column that says if there is a deviation for eg. to the third quantil of the belonging GroupID-Mean. Something like this:
No you cannot calculate the quantiles with a group by, however it is on our feature list. as a workaround, you can use the GroupLoopStart, than a boxplot and some filtering and a loop end will get you these results as well.
The second question can be solved :)
You take the output of the groupBy (for the mean of each group) and join it on the GroupNo with the original data table. Than the Math Node gives you the difference between the mean and the true value.
thanks a lot! This totally works fine (after some time of trying different ways) ! Looking forward for the feature of the GroupBy Node, because with your solution I can only get the "standard-quartiles"... :)
However I have one question left: If I have not a GroupNo like 1, 2, 3 (=Integer) but like G1, G2, G3 (=String) How can I transform the GroupNo G1 into 1 and G2 into 2 etc? Or how can I add a new column with the GroupNo as Integer?
this works fine. However I got a problem when there is no number combined with the number (only string). My groups are named for eg. "Student", "Professor", "Trainee", "Manager" and many more. Isn't there a pssibility to count the groups and use this number in the further process.
I now use the "trick" in using the value counter (then I get all the groups and its names) and export this via "CSV Writer". In the CSSV I add the GroupNo and use this file (File Reader) and join this with the nodes befor, so I get the GroupNo... it works fine but thats not such a beautiful solution ;)
with the next KNIME release at the end of July the group by node will offer much more statistics e.g. second moment, quantile, correlation, covariance, sum of squares/logs, Kurtosis, Skewness, MAD (mean/median absolute deviation).