Identify Outliers in Skewed Distribution

I’m using Numeric Outliers node and it works pretty well. At the same time it looks that distribution is skewed and I’m getting mostly upper bound outliers. What you can recommend in KNIME to address the issue?

Hi Ivan -

I’m not sure I understand the question… if your distribution is skewed, than naturally you would expect more outliers in one direction rather than the other when you examine it with a box plot, for example.

Or perhaps are you asking about ways to transform data to mitigate skewness? Maybe some dummy data would help here.

@ScottF, my distribution is not symmetrical and skewed. I do not know how I can adjust it to get more outliers on the other side. I use Numeric Outliers node because it gives me specific for outliers rows. Say, dbscan gives just outlier or not but without specifics by groups
My data looks like this
Group1, Group2,Group3, label1, label2, Code_count1, Code_coun2,…Code_countN

You could use the Numeric Outliers node with a certain interquartile range multiplier and select to only handle outliers below the lower bound. Then you use the node again, this time with another multiplier and only treating the outliers above the upper bound. Which multipliers you choose of course depends on your data.
Kind regards


Good Idea, @AlexanderFillbrunn. Will try it.

We also have a blog post on the topic of outlier detection: maybe this can help you with your problem, as the Numeric Outliers node is generally not very suitable for skewed distributions.
Kind regards

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.