Cumulative sum with a condition

tomekk · January 23, 2019, 10:32am

Hey all,

I need something like a moving aggregation sum but aggregated for every project and not for every row. It should look like this:

invoice number — invoice amount — Project — Sum of project (-> Column i need)
1000 ---------------- 100€ ----------------- AAA ------ 100€
1001 ---------------- 200€ ----------------- BBB ------ 200€
1002 ---------------- 150€ ----------------- BBB ------ 350€
1003 ---------------- 100€ ----------------- AAA ------ 200€
1004 ---------------- 50€ ------------------- BBB ------ 400€
1005 ---------------- 50€ ------------------- AAA ------ 250€
In the end I need to be able to tell which invoice exceeded a certain project sum, e.g. project AAA exceeded the sum of 180€ at invoice number 1003. Hope someone can help!

Regards,
Tom

Aswin · January 23, 2019, 10:40am

One solution is to perform the Moving Aggregation inside a Group Loop, with the Project column as group.

mlauber71 · January 23, 2019, 10:49am

Group by Project and Left Join the result back to the original entries.

kn_example_group_project_sum.knwf (23.7 KB)

tomekk · January 23, 2019, 12:03pm

Thanks for the reply but I need the cumulated project sum for the ‘current’ invoice in each row. So in row1 the sum should be 200 because of the invoice 1001 for project BBB. In row2 it should be 350 because the sum of the invoices is 200+150 = 350. (I need the sums described in my first post)

Edit: I added 2 more examples in my orginial post to make it clear

armingrudd · January 23, 2019, 12:19pm

Hi @tomekk,

Check this out:

Output:
Cumulative%20sum1

I have used a “Counting Loop Start” node to loop over the table using the number of the rows.
Then 2 row filters, one to include the current row and one to include rows up to current row. Then GroupBy on current row project and filter out the other rows and then a “Column Appender” to append the generated column (sum) to the current row and end loop.
Cumulative sum.knwf (48.4 KB)

Please let me know if this is what you were looking for.

Best,
Armin

mlauber71 · January 23, 2019, 12:27pm

I think I came up with a similar idea. Using an artificial rank and then calculating a sum up to the current point.

kn_example_group_project_sum.knwf (55.9 KB)

tomekk · January 23, 2019, 12:46pm

This is it, thanks to both of you @armingrudd @mlauber71! Far more complex than I thought it would be, comparing it to my simple SUMIF function in Excel.

ipazin · January 23, 2019, 2:10pm

Hi @tomekk!

Don’t get discouraged! It can be pretty simple with KNIME. Like this:

GroupCumulativeSum

In case I haven’t missed something of course And don’t hold it against @armingrudd and @mlauber71. They like KNIME nodes so much they can’t get enough of it

Now you can use some rule based node to see which invoice exceeded a certain project sum

2019_01_23_Group_Cumulative_Sum.knwf (15.8 KB)

Have a nice day,
Ivan

Aswin · January 23, 2019, 2:31pm

By the way, does the Group Loop guarantee that the row order stays the same? If not it may need a Sorter between the loop start and the aggregator.

tomekk · January 23, 2019, 2:38pm

Thanks Ivan, your solution seems to work as well. I guess this is what @Aswin was suggesting in his first answer. The row order does not stay the same so it indeed needs a sorter.

Best,
Tom

ipazin · January 23, 2019, 3:06pm

Hi!
It is sorted inside each group. If row order of whole data is important it can be performed after the Loop End node using RowID for example…
Br,
Ivan

armingrudd · January 23, 2019, 5:31pm

“Moving Aggregation”, what a great node. Thank you so much @ipazin.
@tomekk only has to add the ID column at the beginning and sort the rows based on the ID after the Loop End node.
I have already said that the KNIME forum is a public school that no one should disregard!
https://blog.statinfer.com/a-public-knime-school-that-no-one-should-disregard/

Best,
Armin

system · January 30, 2019, 5:31pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.