Something that regularly appears on the forum is the need to perform cumulative calculations.
For the simple cumulative sum of a single column, there is of course the Moving Aggregation node, but the complexity arises when it is necessary for the the column that is to contain the calculated value to itself be included in the calculation for the following row, as part of a Mathematical formula other than just cumulative sum of a single column.
There is also additional complexity if under certain circumstances, the cumulative calculation is to be “reset”.
Take for example the following table:
If we wished to calculate the cumulative sum of Income vs Costs for each project, then for each row, the ProjectBalance would need to be calculated as the Income - Expense + previously calculated ProjectBalance
and additionally, on change of Project from A to B, the previously calculated ProjectBalance should be discarded.
So how can we achieve that in KNIME using JUST the standard non-scripting nodes, and no cheating (i.e. resorting to my usual favourite, the Java Snippet)
And how can we do it in such a way that we have a “framework” workflow that with only minor modifications specific to the particular data and calculations at hand, it will automatically provide us with the Previous row’s values as we iterate through the rows?
Well, it is certainly going to involve a loop, and of course because we want each iteration of the the loop contain information about what happened on the previous iteration, we will need a Recursive Loop. BUT - for those who have looked at Recursive Loops and thought “too hard”… fear not… for in this “reusable” workflow, you will not have to know how all the recursive stuff works. The aim here is that you can concentrate on the rules and the calculations specific for the requirement, and the rest will just sit there and provide you with automatic access to the previously calculated values.
I have uploaded a workflow to the hub here:
It looks like this:
and if I’ve got it right, you never have to modify anything except the items in the ORANGE BOXES. Which are your input data, and your calculations
To make life easy, all column values output on the previous row are made accessible as PREV#columnname. If you only want a small selection of columns to be made available, you can adjust the Column Filter in the yellow annotation box, but there is no need unless it is to save resources.
The Rule Engine rule in this example is this:
Which says “if the current project is not the same as the previous project, then reset the Project Balance calculation to zero.” . That’s not too difficult is it?
And the Math Formula is this:
Which says: “The current ProjectBalance is the current Income - current Expense + the previous Project Balance”
again… I think that is reasonably clear
And that is all there is too it. You have access, in all nodes placed in the Orange Rule/Calculation Section, to the value of the previous row’s source or calculated values, using the aforementioned PREV#columns and the rest of the workflow acts as a framework to provide those for you.
And just to check… what is the calculated ProjectBalance for my above table?
well… it’s this:
Have fun and please let me know if there are any cumulative calculations it couldn’t cope with.