K-means clustering in order

It seems that K-means node generates clustering names (cluster_1, cluster_2…) in random to the reference value. Is it possible to make the name in order to reference value? now I am using Rule Engine to re-order it like “$Recency_Cluster$ = 2 => 1” but if I re-run the analysis it may generate clustering randomly again then it may not work.

Hello @anguslou,

what do you mean by reference value?
Do you mean that the first cluster in the table is not necessarily cluster_1?

Cheers,

Adrian

Reference value means the columns that clustering are determined, eg, total payment in my case.

I find if the total payment is sorted, the clustering are cluster_0, cluster_2, cluster_1, and so on…

So you mean the columns that are used to compute the clustering i.e. the columns in the include list in the dialog of the k-Means node?

Yes, that’s right. I wonder how to make the clustering in order.

I can see how that might make sense in your case but if you have more than 1 column, it’s not clear how to define such an order.
There is actually a caveat with the current implementation, namely that is uses the first rows as initialization which is a very poor way to initialize k-Means if the rows are ordered.
Therefore I’d recommend to use the Shuffle node to break up any order, then run k-Means and afterwards sort your table again and finally reassign the clusters in the desired order.

Cheers,

Adrian

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.