Identify Outliers in Skewed Distribution

izaychik63 · January 22, 2020, 6:25pm

I’m using Numeric Outliers node and it works pretty well. At the same time it looks that distribution is skewed and I’m getting mostly upper bound outliers. What you can recommend in KNIME to address the issue?

ScottF · January 24, 2020, 8:38pm

Hi Ivan -

I’m not sure I understand the question… if your distribution is skewed, than naturally you would expect more outliers in one direction rather than the other when you examine it with a box plot, for example.

Or perhaps are you asking about ways to transform data to mitigate skewness? Maybe some dummy data would help here.

izaychik63 · January 24, 2020, 10:13pm

@ScottF, my distribution is not symmetrical and skewed. I do not know how I can adjust it to get more outliers on the other side. I use Numeric Outliers node because it gives me specific for outliers rows. Say, dbscan gives just outlier or not but without specifics by groups
My data looks like this
Group1, Group2,Group3, label1, label2, Code_count1, Code_coun2,…Code_countN

AlexanderFillbrunn · January 25, 2020, 3:26pm

Hi,
You could use the Numeric Outliers node with a certain interquartile range multiplier and select to only handle outliers below the lower bound. Then you use the node again, this time with another multiplier and only treating the outliers above the upper bound. Which multipliers you choose of course depends on your data.
Kind regards
Alexander

izaychik63 · January 25, 2020, 3:40pm

Good Idea, @AlexanderFillbrunn. Will try it.

AlexanderFillbrunn · January 27, 2020, 7:48am

Hi,
We also have a blog post on the topic of outlier detection: maybe this can help you with your problem, as the Numeric Outliers node is generally not very suitable for skewed distributions.
Kind regards
Alexander

system · July 27, 2020, 7:48pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.