I'm trying to determinate the best number of cluster in K-Means, but the result of the node 'Entropy Scorer' in all cases always is 0 and I don't know why, any ideas?
I attach the workflow and the results
I'm trying to determinate the best number of cluster in K-Means, but the result of the node 'Entropy Scorer' in all cases always is 0 and I don't know why, any ideas?
I attach the workflow and the results
Hi dgrande,
entropy equal 0 means that your clusters are completely pure in respect to cluster membership. How did you configure the Entropy Scorer node? What is your reference and what is your cluster column? Looks like you chose the same column for both, which indeed leeds to entropy of zero, as a column is perfectly similar to itself.
I guess what you want to do is score the reference column (cluster membership from origin dataset) against the cluster prediction of your k means clustering. Comparing these columns you should have an entropy (at least slightly) higher than 0. If the entropy is still 0 when you compare origin and prediction cluster membership, then your data is devided perfectly regarding cluster purity.
Cheers,
Marten