Definition of Kurtosis value in Statistics and GroupBy Nodes

Hi All

I’m trying to find out whether the Kurtosis value calculated in the Statistics and GroupBy Nodes is the standard Kurtosis value, excess Kurtosis (kurtosis - 3) or an alternative form.
The description info for the Statistics node has no guidance on this at all, while the description tab in GroupBy node’s configuration window contains:-
“Calculates the kurtosis per group. Attention: calculation is bias-corrected and at least four values per group are required. If the latter does not hold, a missing cell is returned.”

It’s not clear what type of bias-correction is used and if this correction is on standard or excess kurtosis or is in fact the calculation for the latter.

From my testing I don’t believe the value can be standard kurtosis as I can get negative values (standard kurtosis should always be positive) so the output could be excess kurtosis or calculated by an alternative method.

Can anybody provide a definitive answer on this?

In general it would be good to capture this level of detail in node descriptions so users fully understand how statistics/values are calculated by a particular node.
As much as I love KNIME, currently I find it difficult to go out there and advocate for it as effectively as I would like when I can’t defend it from accusations of it being a black box solution because of these kinds of issues. I would be happy to contribute to the updating of node descriptions to help improve this situation.

Many thanks for your help


You can look at Alternative Definition of Kurtosis here

1 Like

this is how it is calculated.

Hope this helps?!?



Thank you @Mark_Ortmann and @izaychik63 for the info and links

It took a bit of checking and testing to reach a conclusion on this.
The value returned is excess kurtosis
The java calculation used in the Statistics and GroupBy Nodes exactly matches the 2nd kurtosis equation in this article -

Here it not only subtracts 3 from the kurtosis value to give excess kurtosis (with a normal distribution = 0); it also takes into account the sample size.

Hope folks find that useful



1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.