Hi,
We have made a K-means clustering for concert tickets for a specific danish musicartist. We want to make a segmentation on the customers who have bought tickets to his concerts. We have based the K-means clustering on three continuous variables: Mean number of tickets per concert, Mean amount spend per concert and total amount of concerts. This gave us three clusters.
We now want to analyze the cluster based on some other categorical variables from the same dataset. We are in particular interested in how these clusters spread geographically across different cities in Denmark. The problem is, that no matter how we look at the distribution, they are exactly the same for all three clusters. For example: there are 21% of customers from Copenhagen in each cluster and so on. The distribution is also the same as for the entire dataset before clustering. We find this very strange and wonder if there is something we have overlooked? We are aware that the distribution might be similar, but we find it hard to believe that they are exactly the same in all three clusters for all cities by coincidence.
Can anyone provide us with an explanation? Or any tips please