I have an input table with a column I need to sum up. The values come in already rounded to two decimals only. I also validated that by: Inspecting the table with Full Precision renderer, converting to String and performing visual inspection as well as using regex to identify any row containing a value of more than two decimals
Column with values (excerpt) - rendered set to full precision:
I use group by with aggregation method sum on this column
The result, when inspected with full prevision or converted to string, contains sometimes up to 10 decimals
Result set to full precision:
Result converted to string
This behavior persists even if I add a Math Formula and round all the values before summing them up
I’m afraid I cannot right now share sample data, but may try to re-create later.
Flagging this as this behavior seems illogical and is causing an issue in one of my projects right now where values need to be exactly matching between to sources - one containing the e.g monthly amounts and the other the aggregate sum…
Hi @MartinDDDD , I’m going to attach here a thread from earlier in the year that we never fully got to the bottom of, as to me this sounds related.
I know that using Double in calculations is quite poor when it comes to floating point rounding errors, and Excel too suffers from similar rounding problems, at the drop of a hat, but it would be useful to see a definitive example of it in action in KNIME if and when you are able to produce one.
The input data into the aggregation nodes (groupby or row aggregator) are 100% “two digits only” after decimal separator. However the aggregation then shows many, which to me does not make much sense…
I consulted with my friend Claude though who also says this is perfectly normal behavior due to floating point issues… Still it seems weird from a pure math / logical perspective
Yes it is unfortunately “expected” behaviour for floating point arithmetic, and Excel will do the same thing as in the screenshot below where I copied and pasted the 5400 items and got it to sum them.
People often don’t realise Excel does this, because they don’t expand the decimal places. So it is just more evident in KNIME. But it is still there nonetheless and I’ve often found myself enclosing a sum() or other calculation with a round(n,2) function to ensure that my reconciliation checks match properly.
The “old-fashioned” way of handling this problem in large scale business systems, involving currency, was to always hold the values being calculated in pennies/cents throughout the lifetime of the calculation and only convert back to decimal at the end. Integers do not suffer from rounding problems because they can be exactly represented in binary. Decimal fractions however can for the most part only be stored as an approximation so rounding errors will creep in very quickly.
As in the above screenshot, multiplying all the values by 100 in Excel, then summing the total and dividing by 100 returns the required value without the rounding issue. The same idea can of course be used in KNIME, and would look something like this:
Back to Excel, and it’s quite interesting (for a certain kind of person!) to look and see where Excel introduces the rounding errors. If I calculate the cumulative sum one row at a time, all is fine for 1000+ entries, and then “all of a sudden” something happens…
… and how much of the world trusts Excel (and computers in general ) to be able to add up whilst remaining completely oblivious to the fact that actually they can’t?
My final comment (for now ) though is that for the most part, the rounding error is such that as long as you are aware it might exist, and the basic reason behind it, the final act after the calculation should be to round the result to the required number of decimal places, and actually - perhaps - what we should be asking for is an additional configuration on the aggregation nodes (GroupBy, Row Aggregator) that allows us to optionally specify the required number of decimal places to which the result should be rounded)