Sum of Integers becomes a Long

I have seen this behavior a few times with different nodes and I think it needs to be handled better within the nodes themselves.

Imagine you have a column with numbers - in my case these are mostly amounts - where the number in each record is stored as a Number (integer).
You then need to sum these numbers in one way or another to get for example the total sales for each region.

Once you compute a sum like this, you might find that the sum is now no longer within the bounds of the Number (integer) data type, but needs to be converted to a Number (long). When this is the case, the various nodes - such as GroupBy, Pivot, and Moving Aggregator - all fail silently to compute the sum and return a missing value in its place. There will be a warning that looks something like

Skipped group(s): value groups: “Blue” cause: “Sum > maximum int value. Convert column to long.”

It will, however, still execute the upstream nodes albeit without the correct data.

I realize that there is an easy fix - once you realize that there is something to fix - and that one can simply convert the Number (integer) fields(s) to Number (long), but I think that the node itself should handle the conversion once it finds that it is needed.

It is also not always a straightforward thing to change. You need to use the Table Manipulator node or for some reason the String Manipulation node since this can convert between numeric data types while the Math Formula node cannot. The Expression node can also be used - I think - but it is not clear to me that the parse_int( ) function will always return a Number (long) rather than a Number (integer) as the name would suggest and also I have to first convert the number to a string.

In short, these aggregating nodes should always convert the result to the proper data type on their own.
Int vs Long.knwf (91.9 KB)

Hi @kajensen ,

thanks for your feedback. I agree that the current behavior of the GroupBy can be unexpected. This dates back to when 64bit long integer operations were still expensive on 32bit hardware. For backwards compatibility reasons, this is not a trivial change. However, as you noticed with the Expression node, we started the process to support 64bit long integers everywhere by default. In order to not break existing workflows we are working carefully to roll this out step by step.
As of today, for most tasks there are nodes that support Long Integers. Hence, I suggest to use Long Integers instead of 32bit Integers in new workflows—and the Expression node instead of String Manipulation or Math Formula. If you do run into problems with using Long Integers with specific nodes, please let us know.

I opened a ticket (internal reference AP-24610) to track this particular aspect,
nan

2 Likes