For those who has a b it of experience in clustering, I have a dataset containing a bit more than 150000 records and I would like to know how can I find on the right number of clusters on Knime. Is there a specific node that can help me to find this number?
Please connect to the EXAMPLE server and check out the 011003_loop ParametersKMeans workflow in the 011_FlowVarsAndLoops category. The idea of the flow is to demonstrate how to iterate a number of k's to find the "best" cluster.
thank you guy for your answer :)
Thank you so according to the example the best number of clusters are 3?
I was checking on the server EXAMPLE 011003_loop ParametersKMeans 011_FlowVarsAndLoops workflow category to determine the optimal number of clusters in a database. In the flow is clear what each one of the nodes, but I can not understand is how to determine the optimal number of clusters. The database is the famous IRIS, which by nature are 3 types of flowers and therefore the optimal number of clusters should be three, but I do not know if the report indicates that Loop End node or how to interpret I could help?
Thank you very much.
Go to configure k-Means -> in number of clusters -> variable settings -> use variable -> select k, apply.
It will then use variable k as input for each cluster run, and will give different entropy score for each k iteration