Stdev in population vs sample

guillermoalvma · November 7, 2021, 12:14pm

Hi all, is there any way in statistic node, or maybe another node to configure if you want to calculate the standard deviation in a sample or in a population? (basically apply or not Bessels correction)
In excel and google sheets for example, you can choose stdev or stdevp
Thanks is advance for your help

kevin_sturm · November 11, 2021, 1:33pm

Hello @guillermoalvma,

And welcome to our community. Thank you for sharing your first contribution.
Before we probably get into more detail I want to ask, if you already used the <math formula> node, which gives you a variety of mathematical functions, as well as COL_STDEV. This basically refers to stdev.p. You could compute the stdev.s by creating a sample of your data set (e.g. with <row sampling> or <partitioning> node).
Would that be a start for a discussion?

Best regards,
Kevin

guillermoalvma · November 11, 2021, 11:04pm

Thanks a lot for your answer and for the welcome.
The COL_STDEV function in math formula node calculates stdev.s, i.e the node assume that the dataset is a sample, not a population, that’s the problem.
Best regards.

guillermoalvma · November 18, 2021, 11:45pm

Hi, is there any other info that you need to continue the discussion?

kevin_sturm · November 23, 2021, 3:12pm

Hello @guillermoalvma,

Sorry for coming back to you topic just now.
As it turns out, there was a misunderstanding about the function inside the <math formula> node and you were right about the fact, that it indeed calculates stdev.s.
Since there is no additional node or function available to compute stdev.p right away, one needs to recreate its formula with a series of <math formula> nodes. But this is just one way to do it.
I will open a development ticket on our side to allow calculating it with a simple function for a future release though.

Best regards,
Kevin

system · December 15, 2021, 11:18pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.