clustering and more...

I'm very new with KNIME, so, pardon me if the question is trivial...

I have a data set with two columns of doubles (simplest example).

I want to cluster the data in 10 clusters according to column 1. This can be done in multiple ways and it's clear (data mining > clustering OR data manipulation > numeric binner).

What is less clear is how I can actually collapse both columns in the newly created clusters (for example adding the numbers in column 2 corresponding to the same cluster).

as an example...

col1 col2

1   11

2   12

3   13

4   14

cluster 1 on col1=> [1,2]

cluster 2 on col2 => [3,4]

output required by me

col1           col2

cluster1     11+12 = 23

cluster2     13+14 = 27

even better if I could get in col1 the average or a weighted average pf the values of col1 for each cluster.

Thanks a lot to whoever will be so kind as to point me in the right direction.

It goes without saying that I know how to obtain the result in a number of languages, but I'm now focusing on working with KNIME.



Hi Riccardo,

you could use the GroupBy node in KNIME to group rows of one or more columns and aggregate (e.g. add, average, concatenate, ...) the remaining columns of the data table.

In your example you would select the column with the cluster label as group column and col1 and col2 as the aggregation column with average and sum as the aggregation method. You could use the aggregation columns also multiple times. Meaning you could compute the average and the standard deviation of col1 in one go.



Hi Tobias,

Thanks a lot for your help. The GroupBy node had completely passed me by, but it does exactly what I need!


- R