Data manipulation - Event data aggregation

clusty · April 19, 2013, 11:44am

Hello,

I have event data that I would like to aggregate. The original data is in the form:

C1	C2	C3
T1	P1	V10
T1	P2	V20
T1	P3	V30
T1	P1	V11
T1	P2	V21
T1	P3	V31
T2	...	...

I would like to obtain a new column for each value of column 2 (C2). This column should contain the value in C3, aggregated by C1. Each value of C1 may use a different aggregation function. Here is an example:

C1	P1	P2	P3
T1	V10	V20	V31

I tried the "Pivoting" node, but it seems you can't chose an aggregation function for each value of C2. I then tried to decompose the problem with the node "One2Many". Columns are correctly created but their values are set to 1 (instead of value in C3).

Can anyone help?

Iris · April 19, 2013, 2:03pm

Hi Clusty,

so how many values does C1 have?

If there are only a few, I would use a row filter for each value of c1, then use the pivoting only on the subsample and finally concatenate all of the results together.

Best, Iris

clusty · April 19, 2013, 2:12pm

Hi Iris,

C1 is my timestamp column. So it may contain a distinct value for each hour, for example.

Regards

aborg · April 19, 2013, 4:04pm

24 seems too much for individual row filters (at least for my taste).

I think it would work if you create a table with the C1 values and the required aggregation functions as parameters for pivoting or GroupBy and create a loop through those.

In the loop you can apply the row filter to keep only the required rows and the GroupBy can perform the aggregation.

Others might have better ideas though.

Cheers, gabor

clusty · April 19, 2013, 4:46pm

In fact C1 is a timestamp, so I give example of hours, but I mean date-hour (and I could add minutes). Let's forget about the aggregation at C1 level (I can do this in a later node).

My main concern is to map a variable domain from a column C2 into separated columns having the corresponding value of C3. Here is an example:

C2 C3
P1 10
P2 20
P1 30
P2 40

I would like to have the following:

P1 P2
10 ?
?  20
30 ?
?  40

Is there a node for such manipulation? All I can obtain is the following using One2Many node:

P1 P2
1  ?
?  1
1  ?
?  1

Thanks in advance for your help.

aborg · April 19, 2013, 5:32pm

It resembles to my Pivot node. You might give it a try. (The Merge node or the Sorter might help to rearrange to the proper order.)

Disclaimer: I developed HiTS. Pivot and Merge are in the KNIME Utilities feature.

clusty · April 22, 2013, 8:50am

I found the way to do it: I added an ID column. Then I use the PIVOT node with my new ID column as group column, C2 as pivot column and C3 in the options (aggregation settings) using "First" as aggregation method.

Thanks for your help!