Which nodes can be used for multivariate outlier detection in KNIME?
I am not aware of any slick way to do this in KNIME using native nodes, but if I were headed down this road I would start by looking into using an R snippet with the mvoutlier package. Does anyone else have an alternate approach?
hm... the Fuzzy c-means node can be configured to containing a noise cluster, but I am not sure if this helps
You could use the PCA node to calculate principal components, and then identify the objects that have PCA scores that are outside the 95% confidence interval (i.e. the ones that deviate more than two standard deviations from the mean).
Z-score normalization followed by filtering on mean +- 3*sd seems to work well. Use the Normalizer node to normalize the properties of your choice, by selecting the Z-Score Normalization (Gaussian) option. This normalizes the properties so that their means become zero, and a standard deviation equals 1. Then you can use the Rule-based Row Filter to filter out objects that have properties smaller than -3 or larger than 3 (if you want a 99.7% confidence interval).