Rules in K-means


I am using k-means node to "discretize" a continuous variable, my problem is I need to know the rules which are created by k-means, because I have to apply the same rules to other dataset with the same variable.

For example, if I am using K-means (fold 2) with the attribute "age", I´d like to know rules as:
cluster 1: age < 30
cluster 2 : age >= 30

But the outport is something like:
prototype 0: age -> 37.6875
prototype 1: age -> 49.5777

Is possible to know what I want??


hello again,

The k-means node has a model outport...if I could introduce this model like a model input of another node (a dataset), It will work as I want...Does It exist??


Hello Snyder,

cluster nodes do not produce a rule model, but a model of prototypes.

If you want to discretize your data, I would suggest to use the
- NumericBinner Node (if you already know the intervals) or the
- CAIM Binner Node to determine the boundaries of the intervals automatically.
The CAIM Binner also produces a binning model which can be applied to future unseen data.

If you are interested in learning rules, you can use the weka rule-nodes (JRip or PART for example). You can see the rules in the NodeView of these nodes.

Hope this helps.

- Nicolas

Hello Nicolas,

Thanks for your reply,

I use kmeans to classify datas in cluster, I have called it "discretize", but It isn´t the real concept, so I need use kmeans.

And I need apply the model of prototypes in other dataset for an external validation, that is I´d need two kmeans node (a learner and a predictor), as the NaiveBayes or SOTA nodes...but it doesn´t exist, does it?

Would It be very difficult to implement a kmeans node, which would have a model input (the outport of the existing kmeans node) ,a data input (the new dataset) and the node would make kmeans based on the model created previously??

Thank you!

Hello Snyder,

ok I see what you mean.. no, such a node is not yet part of the current KNIME release. It will appear in the next release of KNIME, then you will be able to assign new data to a set of existing cluster prototypes.

- Nicolas