Extract data from PMML object (as produced by k-menas clustering)

I play around with a k-means clustering node. Clustering works nicely and a new column is appended to the input table which contains the name of the cluster for each row.

What I need is not the name (cluster_1, cluster_2 ...) but the actual mean of the cluster. In order to get it out of the k-means node, I used a PMML-to-cell node on the PMML output port. This produces nice XML which I can write to a file. Unfortunately, the KNIME XML nodes do not work. I tried to put a xpath node after the PMML-to-cell node and read out the cluster means. Xpath does not recognize anything even though it works on other XML files i tried.

So how can I get the cluster means from a k-means clustering?

 

Cheers

Martin

Hi,

this seems like a bug.

 

However you can get the means by recalculating them. Therefore use the group by node

use the cluster column as grouping column

and calculate the mean for all your other dimensions.

Iris

Martin, can you post a small example workflow where you tried the approach with the XPath node?

Already figured it out thanks to some help at the UGM in Zürich :)

The solution was to add "dns" to the xpath query at every step:

//dns:foo/dns:subfoo

I think this is because a PMML has it's own name space or something like this?

 

Cheers

Yes, this is because of the namespace. Namespaces in XML are a chapter of their own...

Hi,

 

I'm not sure you need help anymore but I've found the following example that might help you: 013003_XML_Processing_K-means_centers