# Clustering a file with COVID data

Dear Knimers,
I have another situation for discovering the best k numbers for clustering my data:
a) I have generated (into Knime) the following Excel file, with my preprocessed data:
333_Regs_yyyy-MM_rates_pops_forkMeans.xlsx (31.9 KB)
In this file, I grouped Covid cases using official 21 regions in our State (RS, Brazil) and by month (I studied data along the 16 initial months of tthis pandemic). After, I presented total/female/male populations by regions, as well as counting of cases, hospitalizations and deaths by region and by month. Finally (as population/city varies quite a lot from the capital city to the remainer ones), I calculated rates: for incidence (= (number of new cases / total population) * 1000 inhabitants); and hospitalization rate (= number of hospitalized patients / number of cases in the same population and in the same month * 100%); and lethality rate (= number of deceased patients / number of cases in the same population and in the same month * 100%).
b) now I need to cluster these data, and select which loop would be good and simple to discover the best and the lowest k to adequately represent my data into dense and separate clusters.
c) and finally, I need to plot these clusters, in order to analyse them visually.
I tried the elbow method and the Silhouette coefficient in this task, but I am afraid I didnâ€™t understand exactly how to do it.
Can someone help me?
Thank you all for any help.
B.R.,

Hi @rogerius1st,

Just a quick clarification question, are you having issues understanding theoretically how the elbow method and Silhouette coefficient find the optimal â€śkâ€ť number of clusters or are you having issues integrating these two methods into your workflows?

For the latter, here is a link to a workflow that may be helpful: Clustering_And_Elbow_Graph â€“ KNIME Hub

Cheers,
Dashiell

Dear Dashiell,

1. Iâ€™m not quite sure if my issues are a result of my limited theoretical knowledge about the Elbow Method or the Silhouette Coefficient. Indeed, Iâ€™ve already read a few things about both of them and I also applied them in a few exercises. Notwithstanding, I havenâ€™t got yet suitable workflow configurations with any of them. Iâ€™ve found (here, at Knime.forum) some posts applying loops to find the best k values using loops, but I couldnâ€™t integrate them into my workflows.
2. I have preprocessed my data, and stored these data in an Excel file:
333_Regs_yyyy-MM_rates_pops_for_kMeans_Diversos(2-25)k.xlsx (8.2 KB)
The original 1,330,000 cases were grouped into 333 regions-months (which are the registers of several (21) neighboring municipalities during each of the 16 months of my research, minus 3 missing values).
3. I tried two possible paths for these loops, both suggested in Knime.Forum, with: a) â€śTable row to variable loopâ€ť; and b) â€śParameter optimization loopâ€ť. Here is the image of what I used:

But unfortunately I couldnâ€™t understand quite well the results of both loops.
4. I downloaded your workflow example, but I could go no longer with it because it includes two nodes with Python and another node that uses Entropy for scoring (which I havenâ€™t studied yet). And I have currently no Python capabilities. Indeed, I have no previous training in any formal programming languages. Therefore, I ask you for different options (in Knime, of course), but just using its â€śno-code/low-code nodesâ€ť (i.e., that require no/few written code lines).

Hi @rogerius1st,

Was the Optimized K-Means (Silhouette Coefficient) component close to what youâ€™re looking for? It comes in handy for most cases when Iâ€™m trying to optimize K-Means. And if you click into the component it serves as a nice example of how to use the parameter optimization loop nodes with K-Means without Python code.

Also if you havenâ€™t already, Iâ€™d recommend checking out KNIMEâ€™s [L4-ML] Introduction to Machine Learning Algorithms self-paced course. Thereâ€™s a Clustering module in the course that was helpful for me when I was getting stuck with clustering workflows.

Cheers,
Dashiell

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.