Dear Knimers,

I am currently facing (another) issue with *k*-Means optimization with Silhouette Coefficient.

I have the following file:

Art_I_clustering.csv (462.8 KB)

I tried to use the suggested component for k-Means optimization: (Optimized K-Means (Silhouette Coefficient) – KNIME Community Hub)

Besides, I also tried (in a parallel comparison), to apply a simplified version of loop for optimizations, with and without normalization. I wished to test the accuracies of such schemes. This way, I got four (parallel) *k*-Means algorithmic clusterizations, as in the following screenshot (at the left, using the component without (above) and with (below) previous normalization; and the simplified version of loops for optimization, once again, without (above) and with (below) previous normalization:

Finally, my question is:

I got four graphs, and the recomended

*k*number is:

a) for both the non-normalized component AND the simplified loops, the

*k*was 6;

b) for both the normalized component AND the simplified loops, the

*k*was 5.

Both graphs were remarkably different one another, as if they were depicting different realities. Those graphs are in the following (*.DOCX) file:

Graphs for k-Means optimization using component or simplified loops with or without normalization.docx (201.0 KB)

I ask for someone help me to understand – and to explain such difference, because I am drafting a paper whose goal can be better achieved (accessorily) using one clusering technique.

If any other Knime Community member could help me to choose the most correct option and to present a better argument to explain the differences, I would be most grateful.

Best regards.

Rogério.