Newbie Question: Clustering and Centroids

Hi, hope this is the right forum.

 

I've created a knime workflow to do k-means clustering on a dataset, and it seems to work nicely.

 

Now I want to evaluate the cluster cohesion using Sum of Squared Error (SSE).

 

As far as I know knime doesn't offer anything to do this, so perhaps I can write some java code to do it, to do this i need to know the centroids of the clusters.

 

How do I see the centroid of the clusters, knime gives me a field (cluster) telling me which cluster each record has been placed in, but how do I see the centroid?

Any advice would be very much appreciated :)

Never mind figured it out.

 

Right click the k-means node then click view: cluster view, unfortunately it seems it can only be exported in .png format at the moment and not a text format.

if you want to evaluate the sense of the number of clusters you chose you can calculate the silhouette coefficient or the sum of errors to the centers.

If you chose the second option please note that the error gets smaller with each cluster you add. To make a good choice anyway you can calculate the elbow.

I programmed a node to calculate the elbow automatically and show it in a view. For the input I changed the implementationof the kmeans. it determins now the error while clustering. thats probably the best way. But because it was done at a company I can't offer you that node.

In the knime weka plugin there is a node called x-means,. this node automatically determins the elbow within a range of k (for example 2 to 10) and takes the k from the elbow to cluster the data. Maybe thats what you are looking for.


 

Hello AI, I i used the node x-means to find the right number of K clusters. However, When selecting the maximum number of iterations, the number of clusters found by the node K clusters varies. In the section maxnumberiterations, what number do i have to write ?

Hello,

You can use this component to calculate the distance between every point and its centroid.

Using this output, you can easily calculate cohesion within clusters.

Hope it helps,
Andrea

2 Likes