xmeans

noaesteve · July 11, 2017, 10:31pm

Hello,

I am doing a clustering analysis.

I wpuld like to use the xmeans clustering algorithm from weka as to not determine the number of clusters.

The problem is that when executing the node the result is always the low end from the input clustrer range. ( it only executes one iteration because I can see that in the results). I have been trying to change all the input parameters to find the error but I can't finf it.

Can someone help me? Maybe a workflow example where I could see the used parameters or if there is any problematic one. Advice would be appreciated.

I hope soemone can help.

Thanks

marten_kose · July 20, 2017, 2:29pm

Hi,

clustering results not only depend on the used parameters, but also on the input data. If the records overall are very similar it is possible that the solution with the lowest number of clusters is the best.

I attached a workflow for you which in one case generates very similar data with the Table Creator node and then clusters the data with the Weka XMeans node. It always chooses the lowest number of clusters, no matter what the parameters are. In the other case I gegerated dissimilar data with the Data Generator node and used Weka XMeans again. Here it chooses a reasonable number of clusters.

So maybe it is not about the parameters, but about your data? Did you standardize your data before clustering?

Hope that helps.

Cheers
Marten

wekaclustering.knar