Numeric Outliers node

I found a big difference between Excel and KNIME in calculating lower fence. Please look at attched Excel file. Could you please explain why?Numeric Outliers.xlsx (10.9 KB)

Thank you
Igor

Hi @izaychik63,

if I got it right from Excel what you are calculating you are using second quartile to calculate lower bound. Instead of it use value from D1 and you will get similar results.

Br,
Ivan

Indeed, making Ivan’s adjustment to your calculation above, and also using the R_7 method for quartile calculation, I get exactly the same answers (-25.75 and 0.9, depending on your value for k).

Sorry, I calculated based on SIQR. By the way is it possible to add calculation based on SIQR as an option?

Hi,

you are talking about semi-interquartile range? Don’t understand how you calculated lower bound in Excel based on it if that is what you are talking about. Anyways you can get bound calculation by dividing k with 2 if not mistaken.

Br,
Ivan

Ivan, I mean that for skewed distributions is recommended to calculate fences not from Q1, Q3 but from Q2.
Some interesting ideas are here

Hi @izaychik63 ,

I see. This seems a bit specific so for now there is no plan for adding different fence calculations. Of course in case of more inquires this might change :slight_smile:

Br,
Ivan

Thank you, Ivan. Could your also look at the on k value? on documentation k started from 0. The minimal value I was able reasonably use 0.1. After this value fences calculation became strange and I used 0.

Hi @izaychik63,

played a bit with k parameter and everything seems ok for me. Which value you used that caused fences to be strange?

Br,
Ivan

1 Like

It seems depends on algorithm I used. Currently with R_4 it perfectly works. Seems with heuristics one it was some confusion.

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.