Cluster prediction of 'new' data based on training data

Celestial · September 7, 2020, 3:14pm

Hi!
I am looking for a way to assign ‘new’ datapoints to a set of previously determined clusters. I mean something like this:

Use dataset 1 to identify clusters in dataset 1.
For data from dataset 2, predict to which of the clusters of dataset 1 each datapoint would belong to.

Can I use the Weka EM node for this? Can I use this node to get the probability of a datapoint from dataset 2 lying in each of the dataset 1 clusters? Or can this node only be used for probabilities of datapoints that have been used in the clustering (i.e. in this case giving the probabilities of datapoints from dataset 1 for the clusters of dataset 1.)

Not sure whether this makes sense - my machine learning knowledge is relatively basic. Anyway, thanks for your help!

iperez · September 7, 2020, 3:42pm

Hi!

It depends on the method you are using. If for instance you are doing K-meansusing KNIME’s native nodes you can use the Cluster Assigner Node. In Weka the corresponding node is Weka Cluster Assigner

Celestial · September 7, 2020, 5:10pm

Perfect, that is indeed what I was looking for. Thank you!

ipazin · September 8, 2020, 10:08am

Hello @Celestial,

here is workflow example featuring k-Means clustering:

Welcome to KNIME Community!

Br,
Ivan

system · September 15, 2020, 10:08am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.