Clustering

Hallo,

I'm trying to use the Node "Cluster Assigner".

But in my case, the new data only is assigned to one cluster.

Thank you for your help!

Daniel

How many number of clusters did you choose in your k-means configuration?

Hey Tangerooo,

thank you for your concentration.

I created a dataset in R with three very clear clusters for two variables "income and education".
I normalized them to be able to use the link-types for the distances.
Then I tried "k-Means" with 3 clusters.

Now, I have the first problem, the Node "k-Means" not give me the centers of the observed clusters.
Perhaps this is the reason for the wrong assignation of the connected Node "Cluster Assigner".

But the other Node "Hierarchical Cluster Assigner" doesn't work, too. I configured it with three clusters for the same dataset. And here, the new dataset is assigned to only one cluster, although it is only a short version of the original dataset.

I would be happy, if you have an idea.

Thanks !!
 

DaLack, sorry this is out of my scope. Hopefully someone else in the forum would have input.

The hierarchical cluster assigner can only be used on the same data table that was used for clustering. It is not usable for classifying new data.

The k-means node should output you the cluster centers. You can double check by using the same data table for the assigner as for the leaner. This should give you the same assignments as the first output of the k-means node.

Hey thor,

thank you for your answer!

I solved both problems with your ideas.

So I must use "Hierarchical Cluster Assigner" with the identical data table used in the "Hierarchical Clustering (DistMatrix)". 
You can say, it has the same task as if you make a right click on the usual node ""Hierarchical Clustering" and have a look on the "Clustered data".

I appreciate your help!

Daniel





 

please I want to cluster data set of type "string" 

how can i do that?

 

There is an algoritm called K-modes yeah modes no Means that handles categorical data, but it is available in R. 

Hope that helps

Hi Delack I think I had the same issue... see this post: https://tech.knime.org/node/56082/view

I finally solved this using the Weka Nodes instead of the Knime ones

Cheers

 

Hi, I am kind of new to KNIME. So, pardon me if this question had been asked already.

Basically, I am working on deduplication exercise where a similarity score is being calculated between every two records containing addresses in the table (I am not taking nearest neighbour as 1 as I want the scores of each possible pair). But I have close to 8 lac records in my dataset, and KNIME is taking way too long for this similarity search.

I thought of an alternative to first cluster the records based on the addresses and then apply similary search for calculating similarity scores against each record in each cluster. When I tried running K-Means, it didn't pick up the address but some other numeric field.

Could anyone please suggest on how can I cluster the addresses in KNIME?

Thanks in advance

Vipul