Hello.
I was wondering what the “Group measurments by” option of Outlier Removal node means.
I would really appreciate an explanation with a simple example.
Best regards,
hhkim
Hello.
I was wondering what the “Group measurments by” option of Outlier Removal node means.
I would really appreciate an explanation with a simple example.
Best regards,
hhkim
Hi,
outliers are in this node identified if they are more than x standard daviations away from the mean (option: Mean ±DS) or x times outside the interquantile distance (option: Boxplot).
The group setting is used to do the math for each member of a group to remove intra-group outliers.
Is there a certain thing why you use this node? I personally work with the standard “numeric outliers” node whcih has more options regarding the outlier handling.
Thank you for the reply. I have two follow-up questions:
Does this mean outliers are removed within each group defined by the columns selected under the “Group measurements by” option—i.e., similar to applying a GROUP BY and then removing outliers per group?
In the Numeric Outlier node, how can I configure it to remove outliers based on standard deviation rather than IQR?
Best regards,
hhkim
I’m not sure but I would say so. Maybe you can create a test dataset to find it out.
You’re right, I haven’t notived that the z-score estimation is not part of the node. I usually calculate these scores manually with the “math formula” node:
z_score = ($Col - col_mean($Col))/col_stddev($Col)
and look at the z_scores if they are distributed well.
In the case of z_score filtering you must keep in mind, that this assumes that your dataset is distributed normally. Especcially with a small number of values (per group) this can be challenging.