How KNIME decides what cluster values to use in k-means

trbox · May 26, 2012, 12:51pm

I am an MSc student currently working on my master's thesis, which is due in 3 weeks, so this is kind of urgent ;)

I am using k-means to order my dataset in clusters. The dataset contains about 26,000 151-dimension vectors with values {0,1}. With such a high dimensional dataset, I am running k-means with maximum 500 iterations.

What I can't figure out is how KNIME decides the best match in terms of which clusters it chooses. For each new run of k-means, the algorithm might come up with a new set of cluster centers. How does KNIME decide which cluster centers to use as the result?

Any reference to official documentation of any other help would be much appreciated.

-trbox

Gabriel_Cornejo · June 5, 2012, 3:45pm

Hi trbox:

Kmeans doesn´t choose cluster´s number. The analyst, in other words you, have to decide it. I recommend first run a Hierarchical cluster, but you must to sample because that technic doesn´t work well with 26,000 rows. Look dendogram.