Standard deviation

MCBirne · January 8, 2025, 1:32pm

I would like to calculate the standard deviation from this data.

If I do this in excel with the function stabw.n I get 116,294. If I do it in knim with the node Column Aggregator Ii get 121,46. I get this value in excel if I use stabw.s. How can I also get the value 116.294 in kmie?

HansS · January 8, 2025, 2:05pm

Hi @MCBirne

Welcome to KNIME Forum. Check out this post,

gr. Hans

takbb · January 8, 2025, 2:25pm

Hi @MCBirne , my copy of Excel doesn’t appear to have a stabw.n function , but looking it up, I gather that this is the stdev.p (Std Dev of population) function in my copy.

It seems that the “older” KNIME nodes all use the stdev.s (Std Dev of sample) function (STABW.S I think for your Excel). i.e. Column Aggregator, Math Formula, GroupBy

If you are using KNIME 5.4 (possibly KNIME 5.3 onwards), you can use the new Expression node to get the value you are looking for:

COLUMN_STDDEV

(column, ignore_nan, ddof)

Find the standard deviation of the values in a column, ignoring MISSING values. If all values are MISSING, the result is MISSING.

The ignore_nan option can be used to ignore NaN values. If it is set to TRUE, NaN values are ignored, but if all values are NaN, the result is MISSING. If it is FALSE, then NaN values are not ignored and the result is NaN if any value in the column is NaN.

There is also an optional argument for the delta degrees of freedom (ddof) to use for the calculation. By default it is set to 0, but this can be changed to 1 to calculate the corrected sample standard deviation. See wikipedia:standard_deviation. Other integer values are also accepted.

Arguments

column: The name of the column to aggregate

ignore_nan: Whether to skip NaN values (defaults to FALSE)

ddof: The delta degrees of freedom to use (defaults to 0)

So if the optional third parameter is 0, it returns the value you want for the population (116.294), but if set to 1 it returns for the “sample” (121.46) as per the other KNIME nodes.

A couple of things to note.

For column aggregation functions in the Expressions node, you specify the column name as a string literal e.g. “mycolumn”.

COLUMN_STDDEV("mycolumn")

If you attempt to write the column using the regular column name syntax for the Expressions node,

COLUMN_STDDEV($["mycolumn"])

this will be reported as a syntax error, which feels a little confusing at first.

Unless your sample data is 10 rows or fewer, there is no point trying to test the column aggregation functions using the “Evaluate first 10 rows” button, since it will (as the name suggests) only use the first 10 rows in the calculation, leading to apparently incorrect (preview) results.

edit: btw, the expressions node will return the same value of course on every row. If you just want a single instance of the value, use the groupby node and simply group by the std dev column you just created to get back the single value (or you can use some other mechanism to remove the duplicated values)

MCBirne · January 9, 2025, 10:00am

Fine! I can use it to determine the standard deviation of a column, but I would like to determine it row by row. I have extended the example a little:

How can I implement this?

HansS · January 9, 2025, 11:03am

Hi @MCBirne

You can calculate the standard deviation node by node, like this sted_dev_in_steps.knwf (28.4 KB)

gr. Hans

takbb · January 9, 2025, 12:09pm

I could do a hacky two node solution

Use column aggregator to concatenate the required columns, delimited by “~”, and then execute this String Manipulation:

toDouble( 

         java.lang.Math.sqrt(java.util.Arrays.stream($Concatenate$.split("~"))
                              .mapToDouble(s -> Double.parseDouble(s.trim()))
                              .map(x -> java.lang.Math.pow(x - java.util.Arrays.stream($Concatenate$.split("~"))
                                                .mapToDouble(v -> Double.parseDouble(v.trim()))
                                                .average().orElse(0), 2))
                              .average().orElse(0))
	)

Once again, String Manipulation can do the Mathematics that can’t be done in Math Formula

Ok, so that script doesn’t exactly stick in memory does it, and it’s a bit fiddly, so here is the work done for you:

HansS · January 9, 2025, 1:21pm

I knew it

gr. Hans

MCBirne · January 10, 2025, 11:14am

Great idea! Many thanks

system · April 10, 2025, 11:15am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.