Hi there I am trying to cluster a group of data using the Hierarchical Node. I used two options (as you will see in the attached workflow). The problem arises when I am trying to assign the cluster using either of the two options.
Please let me know if I am doing something wrong.
You cannot assign new data to clusters. The Assigner takes the cluster model and applies it to the same data that has been used to generate the clusters.
Hi Thor thank you for your response, I want to clarify the term new data first. Let suppose that I generate a cluster model using information of 12 months (January to December of Year 01) by monthly averaging the individulas information. So each individual belongs to a certain cluster.
Then in January of Year 02 , I have the information of my individuals for that month and I use the model reader node and the Hierarchical Cluster Assigner to see in which cluster my individulas where assigned.
I want to see cases like this: The person A was assigned to cluster 1 using the 12 month averaged information and using the information of January of Year 02, Person A was assigned to cluster 3. Then I want to see if person A is an outlier.
But I cannot assign the cluster to new data or I am using the wrong approach to my problem.
As I said, it's not possible to assigne new data. The hierarchical cluster model stores the row IDs and the assigner uses them to assigne rows to clusters. It does not take into account the attributes.
What you want to do is called predictive clustering for which there is no node in KNIME (yet).
When assigning clusters learned on "old" data to new data, you are not performing a clustering anymore. Clustering is an unsupervised learning task, in which one of the objectives may be to assign labels to data which are initially unlabelled. Once you are satisfied with the clusters, you can treat them as labels.
Now if you want to "apply" such labels to new data, you'll have to adopt a supervised learning approach.