K-Means clustering - segmentation by geographic areas


We have made a K-means clustering for concert tickets for a specific danish musicartist. We want to make a segmentation on the customers who have bought tickets to his concerts. We have based the K-means clustering on three continuous variables: Mean number of tickets per concert, Mean amount spend per concert and total amount of concerts. This gave us three clusters.

We now want to analyze the cluster based on some other categorical variables from the same dataset. We are in particular interested in how these clusters spread geographically across different cities in Denmark. The problem is, that no matter how we look at the distribution, they are exactly the same for all three clusters. For example: there are 21% of customers from Copenhagen in each cluster and so on. The distribution is also the same as for the entire dataset before clustering. We find this very strange and wonder if there is something we have overlooked? We are aware that the distribution might be similar, but we find it hard to believe that they are exactly the same in all three clusters for all cities by coincidence.

Can anyone provide us with an explanation? Or any tips please

Hello @Ibenmn and a very warm welcome to the KNIME community!

Thank you for addressing your question in the forum. If I may add my 50 cents to the topic, I fear that it is rather difficult do remotely diagnose a workflow to find out what went wrong. Would you mind uploading a workflow with some toy data in order to help the community with the diagnosis of the issue?

Best regards,


This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.