How to use K-means learner ?

Heey,

I hope you guys having a wonderful weekend.

My question is that I used K-means learner with a big dataset of 78 attributes. The dataset is not labelled as its unsupervised learner. So the data has two output either blue or red. So how many clusters shall I put in the K-means learner ? I still confused how this works ? If I put 2 cluster its return error. Any suggesting?

Thanks

Hi,

Actually I cannot understand you here.
You have a dataset of 78 attributes (columns) and the instances are not labeled. Now what is the “blue” or “red”? Are they the output you seek? Do you want to label your data with either blue or red?
The best way that you can get help is to provide a sample input dataset and then to define the desired output.

Best,
Armin

Hey, Thanks for replaying…

Yeah its 75 attributes (columns) and yeah I seek to find the output that I know already becuase the data was labelled and I removed the label column that contain either blue or red for each row. So I want to use unsupervised to test if the learner can find the output.

Thanks.

So the data was labeled before and you want to reproduce the same labels.
First of all, would you please let me know what kind of error do you get?

The possibility of doing that depends on the method by which the data was first labeled.
E.g. if the labels were produce by using a clustering method, then you can use the same method. (Which is so obvious…).

If the labeling was not based on some method or pattern in data, the possibility of reproducing the exact same labels by using clustering would be low.

Here I have a suggestion for you:
If you have not removed the labels entirely and just have excluded them to see if you can reproduce them, it would be a much better idea to use classification here. You can train a model based on labeled data and then use that model to label new entries.

I hope I have not missed anything.

Best,
Armin

The data was labeled before yeah I removed the column label before feeding the data to the learner. I want to use unsupervised algorithm to compare it with supervised learner which is already done as I used the data with labeled column.

So as I understand the unsupervised learner take all the data and assign cluster to each row. So my question was how many clusters I have to add in the k-means learner settings?

I have included all columns in the k-means which 78. so the number of cluster have to be 78?

so how is unsupervised learns if I add all 78 and it will assign different clusters to each one of them? and if I add only 2 clusters and included all the columns it return “execute failed: no winner found -1”

Sorry to confuse you as I am confused as well. Maybe the selection of the algorithm not suitable to my problem ?

Thanks

Have you tried to do some dimension reduction?
PCA for example:

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.