K-means clustering in order

anguslou · November 21, 2019, 6:03am

It seems that K-means node generates clustering names (cluster_1, cluster_2…) in random to the reference value. Is it possible to make the name in order to reference value? now I am using Rule Engine to re-order it like “$Recency_Cluster$ = 2 => 1” but if I re-run the analysis it may generate clustering randomly again then it may not work.

nemad · November 21, 2019, 8:06am

Hello @anguslou,

what do you mean by reference value?
Do you mean that the first cluster in the table is not necessarily cluster_1?

Cheers,

Adrian

anguslou · November 21, 2019, 8:57am

Reference value means the columns that clustering are determined, eg, total payment in my case.

I find if the total payment is sorted, the clustering are cluster_0, cluster_2, cluster_1, and so on…

nemad · November 21, 2019, 9:14am

So you mean the columns that are used to compute the clustering i.e. the columns in the include list in the dialog of the k-Means node?

anguslou · November 21, 2019, 9:16am

Yes, that’s right. I wonder how to make the clustering in order.

nemad · November 21, 2019, 9:31am

I can see how that might make sense in your case but if you have more than 1 column, it’s not clear how to define such an order.
There is actually a caveat with the current implementation, namely that is uses the first rows as initialization which is a very poor way to initialize k-Means if the rows are ordered.
Therefore I’d recommend to use the Shuffle node to break up any order, then run k-Means and afterwards sort your table again and finally reassign the clusters in the desired order.

Cheers,

Adrian

system · May 21, 2020, 9:42pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.