Aggregation of elements

aesposito · February 12, 2019, 2:38pm

Hi to everyone,
I have a problem with aggregation.
I want to aggregate elements of a subset of a table.
Each subset is composed of 100 rows.
In each row there is a set of elements that could be duplicates.

I’d like to create a set of elements that contain, for each row, all the elements that there are in the previous rows.

In the following an example:

RowID0 | [a,b,c] | [a,b,c]
RowID1 | [a,b,c] | [a,b,c]
RowID2 | [a,c,c,e] | [a,b,c,e]
RowID3 | [a,b,c,d,e] | [a,b,c,e,d]
RowID4 | [c,f] | [a,b,c,e,d,f]
.
.
.
RowID99 | [z,j,z,y] | [a,b,c,e,d,f, … ,z,j,y]
RowID100 | [1,90,w] | [1,90,w]
RowID100 | [a,b] | [1,90,w,a,b]
ecc.

Can, someone, helps me?
Thanks,
Andrea

DaveK · February 12, 2019, 3:43pm

Hi @aesposito,

is this workflow what you are looking for?

Aggregation of elements.knwf (24.1 KB)

Currently, this workflow just attends to the following problem:

The output of the Loop End node will have the column of aggregated values which you can easily join with the input to create a table looking like your example.

Regarding:

Are your subsets always 100 rows big? And are they in consecutive order? If so, you can easily extend my attached workflow by wrapping it with a Chunk Loop Start/End node configured to make partitions of size 100.

Cheers,
David

aesposito · February 12, 2019, 4:37pm

Thanks David, the example is correct.

Yes.
I have a list of days, and for each day I have 100 ordered rows.
So, for example, imagine that in a table there is a week (02/11 - 02/17) and each day is composed by 100 rows (100 rows with 02/11, 100 rows with 02/12, … 100 rows with 02/17).
For each day, the 100 rows are numerated (the second column contains the number of the row of that day, ordered from 0 to 99).
In this example, I know that my table is totally composed by 700 rows.

DAY | BAND of the day (0:99) | SET of elements.

I try with the Chunck Loop, as you suggested, but I think that the workflow will be too expensive in terms of time.

I tried to use the Group By node (Group settings: Include DAY and BAND, Manual aggregation; SET with ‘Union’) or the Moving Aggregatoion node, but I don’t know how correctly set them.

Andrea

DaveK · February 13, 2019, 3:47pm

Hi @aesposito,

Yes, the loop inside the loop will be quite expensive.
One more question: Do you explicitly need to accumulate the set in the second column (with accumulate I mean the union of all rows above, thus the intermediate results)? Or would it suffice to aggregate all 100 rows into one set? If so, you could also use the Chunk Loop, do not select a grouping column (hence, the whole table will be aggregated), and then select union as aggregation method.

Also, a workflow with your data would help if the data is OK to share.

Cheers
David