Clustering a file with COVID data

Dear Knimers,
I have another situation for discovering the best k numbers for clustering my data:
a) I have generated (into Knime) the following Excel file, with my preprocessed data:
333_Regs_yyyy-MM_rates_pops_forkMeans.xlsx (31.9 KB)
In this file, I grouped Covid cases using official 21 regions in our State (RS, Brazil) and by month (I studied data along the 16 initial months of tthis pandemic). After, I presented total/female/male populations by regions, as well as counting of cases, hospitalizations and deaths by region and by month. Finally (as population/city varies quite a lot from the capital city to the remainer ones), I calculated rates: for incidence (= (number of new cases / total population) * 1000 inhabitants); and hospitalization rate (= number of hospitalized patients / number of cases in the same population and in the same month * 100%); and lethality rate (= number of deceased patients / number of cases in the same population and in the same month * 100%).
b) now I need to cluster these data, and select which loop would be good and simple to discover the best and the lowest k to adequately represent my data into dense and separate clusters.
c) and finally, I need to plot these clusters, in order to analyse them visually.
I tried the elbow method and the Silhouette coefficient in this task, but I am afraid I didn’t understand exactly how to do it.
Can someone help me?
Thank you all for any help.
B.R.,
Rogério.

Hi @rogerius1st,

Just a quick clarification question, are you having issues understanding theoretically how the elbow method and Silhouette coefficient find the optimal “k” number of clusters or are you having issues integrating these two methods into your workflows?

For the latter, here is a link to a workflow that may be helpful: Clustering_And_Elbow_Graph – KNIME Hub

Cheers,
Dashiell