Cluster prediction of 'new' data based on training data

I am looking for a way to assign ‘new’ datapoints to a set of previously determined clusters. I mean something like this:

  1. Use dataset 1 to identify clusters in dataset 1.
  2. For data from dataset 2, predict to which of the clusters of dataset 1 each datapoint would belong to.

Can I use the Weka EM node for this? Can I use this node to get the probability of a datapoint from dataset 2 lying in each of the dataset 1 clusters? Or can this node only be used for probabilities of datapoints that have been used in the clustering (i.e. in this case giving the probabilities of datapoints from dataset 1 for the clusters of dataset 1.)

Not sure whether this makes sense - my machine learning knowledge is relatively basic. Anyway, thanks for your help!


It depends on the method you are using. If for instance you are doing K-meansusing KNIME’s native nodes you can use the Cluster Assigner Node. In Weka the corresponding node is Weka Cluster Assigner


Perfect, that is indeed what I was looking for. Thank you!

1 Like

Hello @Celestial,

here is workflow example featuring k-Means clustering:

Welcome to KNIME Community!



This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.