Cleaning Hierarchical Clustering from Outliers

Hello KNIME Community,

I have a Question concerning the Hierarchical Clustering Node:
I´m trying to get some nicely clustered data in my dendogram, but sometimes there are single clusters or branches that tend to stand alone.

I know they are outliers using the dendogram, but the algorithm can´t.
Is there a way to remove those outliers from my dataset?

Thank you for your help!


Hi Oso,
how about this: you filter the data points which are the outliers after the clustering and than recluster again?
You could for example filter all very small clusters?
And if you afterwards have big clusters only you know you found all?
Does this help? If not, some sample data and workflow or any screenshots would be helpful :slight_smile: :knime:
Cheers, Iris

Hello Iris, thank you for your answer.

Is there a comfortable way to filter all the very small clusters? I don´t really know how to filter them.

What I did so far was:

  • Do the clustering
  • Look at the distance view to know my max amount of clusters
  • cluster again with more clusters
  • “Ignore” the small clusters while looking at the data of each cluster… .