Using distance matrix as input for k-Means-Node

Ulja · April 14, 2015, 5:06pm

Hello Knimelings,

does someone know how to use output (typed "distance matrix") of the "Distance Matrix Calculate"-Node as input for the k-Means-Node?

At the moment I use a work around where the full distance matrix is written to a CSV-file (using "Distance Matrix Writer"-Node), which is loaded using a "CSV Reader"-Node to create a Knime table representation of the distance matrix.

Sadly, this approach becomes unhandy when used in a loop.

Best regards

Ulja

Iris · April 14, 2015, 5:26pm

You can use the K-Medoid Node. It does use the distances as provided in its distance inport.

Cheers, Iris

Ulja · April 15, 2015, 2:19pm

Sure I can use a different data mining algorithm, but that dosn't solve the problem.

I came up with another workaround that is working within loops without manual user interaction. It uses a lot java snippets and Split/Create Collection Column nodes (see attachment).

Sadly, its not completely generic. It only works for 500x500 distance matricies. Also Column and row naming may be a problem in other use cases.

Any futher ideas or improvments would be appreciated.

Best regards

Ulja

distmatrixtotable.zip

Iris · April 15, 2015, 3:14pm

Hm. I could not execute your workflow. But I suppose you now create the mean centers of the distances?

K-Medoid has the same algorithm as k-Means, but uses the Medians of the cluster instead of the Mean. The Median is the most central data point in the cluster.

Best regards, Iris

Ulja · April 15, 2015, 4:16pm

Right, k-Mediod cluster results are similar to k-Means cluster results and one should not use k-Means if the data is represented as distance matrix - Fair enough.

Maybe the topic of this thread is misleading. There are other things I want to do with the calculated distances but only few nodes can handle a distance matrix as input. I was suprised that it is that hard to convert a distance matrix into a table, which can be handled be almost every node.

The attached workflow is not supposed to run out-of-the-box. You have to provide some data to the distance matrix node that has Row0 .... Row499 as Row ID and you have to reconfigure the node.

Best regards

Ulja

Iris · April 15, 2015, 4:24pm

It helps if we get running examples. You can always generate such an example data set with our Modular Data Generators or the simple Data Generator node.

Maybe if you ask the question you are interested in, we also find a answer?

Ulja · April 15, 2015, 4:58pm

Here you go:

How to convert a distance matrix into a table?

I attached an almost generic workflow example as first aproach.

distmatrixtotable.zip