Regarding to my last topic, Now I wonder if there is a method in KNIME to determine the best number of clusters.
I think there could be a node with the option to choose from the most known methods like Elbow, Silhouette and gap to specify the best number of clusters.
I had checked this workflow before.
The Entropy Scorer node needs a reference column. In the example, it is clustering on Iris dataset which has a class column.
How can I use this method to find optimal number of clusters for my dataset which has no class column? I want to cluster the data and use the clusters as class for classification.
It is tricky to find the optimal number of clusters and depends on many aspects. There is no node in KNIME which performs one of the methods you have mentioned above, but you could build a workflow for it. You may take a look at this post where a user had a similar question:
Yes we already have such a node on out list for a future release, it would be a really nice new node. However, I cannot make promises in which release we will manage to add this node.
The topic @izaychik63 linked to contains a link to a extension which seems to do what you are searching for. I don’t know the extension and it is not a trusted KNIME Community extension, so you are free to use it of course, but I cannot ensure any correct functionality. Thanks for the link, though, @izaychik63.
If the extension does not provide what you are searching for, using R might be the easiest solution for now.
and the example workflow has a bug unfortunately. The decisive variable is not connected so it would always come back with the optimal cluster number 3.
The workflow in examples server doesn’t have this bug. The number of clusters in K-Means node is read from the variable.
Thank you everybody. I hope we’ll have the new node for determining the optimal number of clusters in KNIME soon.