I am using k-means node to "discretize" a continuous variable, my problem is I need to know the rules which are created by k-means, because I have to apply the same rules to other dataset with the same variable.
For example, if I am using K-means (fold 2) with the attribute "age", I´d like to know rules as:
cluster 1: age < 30
cluster 2 : age >= 30
But the outport is something like:
prototype 0: age -> 37.6875
prototype 1: age -> 49.5777
The k-means node has a model outport...if I could introduce this model like a model input of another node (a dataset), It will work as I want...Does It exist??
cluster nodes do not produce a rule model, but a model of prototypes.
If you want to discretize your data, I would suggest to use the
- NumericBinner Node (if you already know the intervals) or the
- CAIM Binner Node to determine the boundaries of the intervals automatically.
The CAIM Binner also produces a binning model which can be applied to future unseen data.
If you are interested in learning rules, you can use the weka rule-nodes (JRip or PART for example). You can see the rules in the NodeView of these nodes.
I use kmeans to classify datas in cluster, I have called it "discretize", but It isn´t the real concept, so I need use kmeans.
And I need apply the model of prototypes in other dataset for an external validation, that is I´d need two kmeans node (a learner and a predictor), as the NaiveBayes or SOTA nodes...but it doesn´t exist, does it?
Would It be very difficult to implement a kmeans node, which would have a model input (the outport of the existing kmeans node) ,a data input (the new dataset) and the node would make kmeans based on the model created previously??
ok I see what you mean.. no, such a node is not yet part of the current KNIME release. It will appear in the next release of KNIME, then you will be able to assign new data to a set of existing cluster prototypes.