Standard deviation

I would like to calculate the standard deviation from this data.
image

If I do this in excel with the function stabw.n I get 116,294. If I do it in knim with the node Column Aggregator Ii get 121,46. I get this value in excel if I use stabw.s. How can I also get the value 116.294 in kmie?

Hi @MCBirne

Welcome to KNIME Forum. Check out this post,

gr. Hans

4 Likes

Hi @MCBirne , my copy of Excel doesn’t appear to have a stabw.n function , but looking it up, I gather that this is the stdev.p (Std Dev of population) function in my copy.

It seems that the “older” KNIME nodes all use the stdev.s (Std Dev of sample) function (STABW.S I think for your Excel). i.e. Column Aggregator, Math Formula, GroupBy

If you are using KNIME 5.4 (possibly KNIME 5.3 onwards), you can use the new Expression node to get the value you are looking for:

COLUMN_STDDEV

(column, ignore_nan, ddof)

Find the standard deviation of the values in a column, ignoring MISSING values. If all values are MISSING, the result is MISSING.

The ignore_nan option can be used to ignore NaN values. If it is set to TRUE, NaN values are ignored, but if all values are NaN, the result is MISSING. If it is FALSE, then NaN values are not ignored and the result is NaN if any value in the column is NaN.

There is also an optional argument for the delta degrees of freedom (ddof) to use for the calculation. By default it is set to 0, but this can be changed to 1 to calculate the corrected sample standard deviation. See wikipedia:standard_deviation. Other integer values are also accepted.

Arguments
  • column: The name of the column to aggregate
  • ignore_nan: Whether to skip NaN values (defaults to FALSE)
  • ddof: The delta degrees of freedom to use (defaults to 0)

So if the optional third parameter is 0, it returns the value you want for the population (116.294), but if set to 1 it returns for the “sample” (121.46) as per the other KNIME nodes.

A couple of things to note.

  1. For column aggregation functions in the Expressions node, you specify the column name as a string literal e.g. “mycolumn”.
COLUMN_STDDEV("mycolumn")

If you attempt to write the column using the regular column name syntax for the Expressions node,

COLUMN_STDDEV($["mycolumn"])

this will be reported as a syntax error, which feels a little confusing at first.

  1. Unless your sample data is 10 rows or fewer, there is no point trying to test the column aggregation functions using the “Evaluate first 10 rows” button, since it will (as the name suggests) only use the first 10 rows in the calculation, leading to apparently incorrect (preview) results.

edit: btw, the expressions node will return the same value of course on every row. If you just want a single instance of the value, use the groupby node and simply group by the std dev column you just created to get back the single value (or you can use some other mechanism to remove the duplicated values)

7 Likes

Fine! I can use it to determine the standard deviation of a column, but I would like to determine it row by row. I have extended the example a little:


How can I implement this?

Hi @MCBirne

You can calculate the standard deviation node by node, like this sted_dev_in_steps.knwf (28.4 KB)
image
gr. Hans

2 Likes

I could do a hacky two node solution :wink:

Use column aggregator to concatenate the required columns, delimited by “~”, and then execute this String Manipulation:

toDouble( 

         java.lang.Math.sqrt(java.util.Arrays.stream($Concatenate$.split("~"))
                              .mapToDouble(s -> Double.parseDouble(s.trim()))
                              .map(x -> java.lang.Math.pow(x - java.util.Arrays.stream($Concatenate$.split("~"))
                                                .mapToDouble(v -> Double.parseDouble(v.trim()))
                                                .average().orElse(0), 2))
                              .average().orElse(0))
	)

Once again, String Manipulation can do the Mathematics that can’t be done in Math Formula :rofl:

Ok, so that script doesn’t exactly stick in memory does it, and it’s a bit fiddly, so here is the work done for you:

1 Like

I knew it :slight_smile:

gr. Hans

1 Like

Great idea! Many thanks

1 Like