Aggregating Accounting Data

Hi @Unlockedata

From experience, the task if running against the kind of numbers of transactions you are looking at would take an exceptionally long time to complete as every additional row multiplies up the number of possible combinations to assess. The time increases exponentially with the row counts.

You will need to do everything you can before the number-matching occurs to “weed out” incorrect matches if you are to have it run in any reasonable time frame.

For every value in Table 2, you are going to be trying to find all possible combinations of rows in Table 1 which sum to the value. You have said there could be credits/refunds which I take to mean negative values too, so you cannot necessarily rule out numbers just because they are larger than the target, since there may be a balancing negative number further down the list. So at worst case (i.e. with no specified constraints) you are attempting to sum all possible sets of 40,000 values. If my maths is correct, that is 2 raised to the 40,000 power combinations! Which even processing 1million combinations per second I think it wouldn’t complete within the expected lifetime of several universes! :open_mouth:

So, we would need to refine the rules to make it manageable. This is why it is important if we can say there would be an absolute maximum limit of the number of rows in Table 1 which might realistically combine to form a row in Table 2. The lower that number, the fewer combinations the process would have to consider. But with no stated limit, it would have to assume everything is possible.

If you can group the 40,000 transactions into much smaller groups because maybe you know that all related transactions are likely occur in a certain timeframe, or because you have some other way of knowing how they are grouped, or you have this string value that rules out certain rows on Table1 from from being related to a row on Table2, then you need to find a way to apply those first.

Basically, you want the tables being compared to be as small as possible, without compromising the matching process. Combinations are your enemy, and there are far, far fewer combinations in a very large number of small tables than there is in one large table.

The next issue is how do we know what the termination-condition is. If we knew that we are looking for everything to balance, then the termination-condition is when everything matches. But in this case, there may not be a match for some things. So pulling together different combinations we can find that we can match things up in different ways but never have it fully balance. How do we decide the job is complete, and how do we choose the “best” set of combinations ?

So still plenty of questions, and things to think about. I haven’t got any answers at the moment, but I have some posts on related problems from which we might draw some inspiration (and I suggest plenty of coffee!) :wink:

2 Likes