Replacing outliers based on Z-score

Hi everyone,

I want to replace the outliers in my data with another acceptable value. I know you can use the node “Numeric Outliers” to replace outliers based on the Interquartile Range. But because some of my variables contain a lot of zeros, this would mean the outliers of these variables would be transformed to zero (because the interquartile range is zero) and the variable would have no predictive value. I was wondering if there is a way to replace the outliers based on the Z-score (for example, first z-score normalizing the data with the “Normalizer” node and then replacing every value of a certain variable which is higher than 3 (lower than -3) with 3 (-3)).

Thanks in advance!

Hi @jeandony and welcome to the forum -

This should be possible using a Rule Engine node. Rules might look something like:

MyNormVar > 3 => 3
MyNormVar < -3 => -3

Hi @ScottF,

Thanks for your quick response. Is there a way to do this for multiple columns at the same time? Because now it seems I have to link as much Rule Engine nodes as there are columns I want to transform.

Kind regards,

There are a few other options you could try, depending on how many variables you need to deal with.

If you have a few variables that you want to deal with in a single node, you could try Column Expressions.

If you have a lot of variables, you might instead want to loop over the Rule Engine using a Column List Loop Start.

2 Likes

Hi there @jeandony,

Math Formula (Multi Column) should help here as it can apply calculation on multiple numeric columns and has if() function.

So following expression should do the trick:
if($$CURRENT_COLUMN$$ > 3 ,3 ,if($$CURRENT_COLUMN$$ < -3 ,-3 ,$$CURRENT_COLUMN$$))

Welcome to KNIME Community!

Br,
Ivan

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.