Custom moving aggregator (without using loops) Possible?

Hi Friends
I will be very grateful if someone can help me with this issue.

“This post will be a bit long because I want to detail well what I need and I believe it’s a bit advanced. I also want to mention that perhaps a solution could be Python or another programming language, but I’m not familiar with it.”

My Goal:
I have a table and I want to calculate the values for each Iteration, **but I don’t want to use loops. Or maybe a way to calculate even faster than loop

Why you don’t want to use loops?
Because I have a large table with hundreds of products and I’ve already completed the task using loops, but I need to iterate through each row to store the last values in a variable and to calculate the next value, and my calculations take approximately 0.7 milliseconds for each iteration. For a table with 160 thousand items, it took 20 hours. (I will show what I did in the end)

Here’s the problem step by step

I have this table with a lot of “cProd” (product_id)

1° First I will need to use a Group Loop Start.
But you said you don’t want to use a loop?
Actually, I don’t want to use a loop row by row , but at least the loop by product group is necessary to use.

image

Result

2° Rules:

(a) The first row is the initial inventory value from the last month. That means quantity 6, unity value 34,69, total value 208,14 (orange)
(b) the rows below are the invoices that will increase or decrease the inventory. (blue) See that if the code is “1409” the qt_inventory is positive in 1 item, if code is “5927” is negative. (Positive is a new item in inventory, negative is like a selling product that is leaving inventory)

3° Sorter
For my case I need to guarantee the sorter order. I won’t explain the reason in detail, but the order is of extreme importance because first, I need to have a product in stock before selling, and therefore the input needs to come before the output."

image

4° Column expression then Moving aggregator

Well, if I need to compute the inventory, then I need to use “moving aggregator node” right? Yes :blush:

But first, I need to put the values ​​one below the other, because the moving aggregator requires it to be that way.

image

Notice that the column “inventory_sorter” is always zero for the initial value and one for invoices. So I use this rule to create a column with all these values together (last column)

4.1 Now “moving aggregator”

image

Perfect :blush:- Moving Aggregator worked fine and I didn’t need to use loop for each product to store the first calculation in memory (variable). This concept is very important.

Before I continue to the difficult part, in my actual project, I had to store the value 6 in a variable and then create a loop only for the first item, calculating 6 + 1 = 7 and overwrite the old variable 6 with 7, so that in the next iteration it would be 7 + 1 = 8, overwriting the old variable 7 with 8, and so on, so that in the next iteration it would be 8 - 1 = 7… and so forth. But notice that up to this point, using the moving aggregator, I didn’t need to loop. I don’t know what the rule is behind the moving aggregator’s source code, but it behaved like a column expression and instantly calculated. Note that if I have hundreds of items, the loop would have to iterate one by one. For me, that takes 0.7 milliseconds (which is a long time). (to propagate values Loop End>Configuration>flag Propagate modified loop variables)

5° Starting Difficult part - Calculate de vl_unit_inventory and vl_total_inventory

Rule: When the code is “1409” I need to sum the vl_total_inventor in the first row + 40,11.
Let’s calculate:
Variable 208,14
if “1409” then 208,14 + 40,11 = 248,25

:bulb:Well, it seems that I can use moving aggregator again ( :exclamation:this is a catch and won’t work on second product. I will show)
Let’s use column expressions again to put those values in the same column like item 4.

var calc1 = column("value1_st")
if (column("inventory_sorter") == 0)
{column("vl_total_inventory")}
else calc1

Great- All in the same column

Now moving aggregator
image

All right, all most there. But the last row I don’t need the value because the last row is an invoice negative and I need to disregard a value. So let’s replace the last row with “null”

Done

Then I need to calculate the vl_unit_inventory and for that I used more four nodes
image

The result was PERFECT for the first item because the code “1409” was one below other
image

Let’s review the final calculation.

Row1 = 6 * 34,69 = 208,14
Row2 = 208,17 + 40,11 = total 248,25 / 7 = unit 35,46
Row3 = 248,25 + 40,11 = total 288,36 / 8 = unit 36,05
Row4 = 288,36 - 36,05 = total 252,35 / 7 = unit 36,05

So, the moving average is PERFECT :white_check_mark: And I didn’t need to use any loops.

**Now, the second item Iteration PERFECT :white_check_mark: (because there is no “1409” code to change the average)

**Now, the third item Iteration ERROR :x:(because there is “1409” code to change the average in the middle)

Those are the correct values.
Row5 = 271,98 + 75,97 = total 347,95 / 8 = unit 43,49
Row6= 347,95 + 75,97 = total 432,92 / 10 = unit 42,39

Explanation:
This concludes that I almost managed to perform the calculations without needing to use a loop item by item. If it had worked, I would have the result instantly.

I’m taking 24 hours to get the result for all products and companies.
Can we achieve anything?

Is there any formula within the Column Expressions that stores the value in a variable to be used in the next row? Like a “for i” loop?

Will I still continue using the loop?

This is my real project with loop row by row. Its Big.
But I want another simple and fast way

01.results_with Loop_012024.xlsx (15.9 KB)
Ps: this post
Help__workflow.knwf (207.1 KB)

ps: this post took me 3 hours :sweat_smile:

1 Like

Hello @Felipereis50 and nice long post :sweat_smile:

Yes there is and I suggest to use it. But it’s not a one simple formula. Anyways once written should be much faster. And correct :wink: Maybe start with example from here which features both solution with Group Loop and without it:

Additionally this is a good post (and long as well!) from @takbb where he describes together with example how to use “global variables” inside Column Expressions node:

Additionally this is also possible with other programming languages (nodes) as Java and Python but if you are not familiar with it go with Column Expressions.

Br,
Ivan

5 Likes

Hi Ivan
Great.

I will read the links you provided to understand this “global variables” inside column expressions node.
Happy to know that is possible…

I will come back soon with some results or doubt… :fist_right: :sparkles: :fist_left: :brazil:

3 Likes

Owwww…

Amazing post from @takbb about “column expressions”.

I think I’ll manage to do it using the takbb model.

(I need to learn Java concepts. It seems very similar to VBA when I took a course. But I didn’t focus on learning it.)
If I knew Java, it would have greatly reduced my workload.

But I’ll manage.

I’ll be back soon.

1 Like

Hi @ipazin
Sorry to bother you.

I’ve analyzed your code at the link provided. It took me a couple of hours to understand, but I managed. Great explanation. :slightly_smiling_face:

  1. I didn’t know that “column expressions” used other expressions besides those listed in the “functions” range (switch isn’t listed there).

  2. I understood that the “Default” within the switch will be the rule for all other rows that are not within the “case.”

Using your concepts from the example, I managed to achieve what I wanted, but I had to use one more node than I expected (offset_prev_row). I created a node to bring the value from the row above (an offset) so I could use it in the next calculation (node). For me, it was easier to keep it in the same line to do the calculation. And that’s why I would like your help.

Could you evaluate what I did? And I would like to know if it’s possible to incorporate the rule and not have created the “offset” node. It’s more out of curiosity and learning.

No rush, okay.
Whenever you can.

Below is the result I expect (Red) and in third will be possible if worked.
First iteration
image

Second
image

Third
image

ps: in one node I used a Ternary expression (found on Google) because I needed to use a condition inside a variable, and I notice that is not possible using “if”, but is possible using Ternary :confused:

var rule1=(column(“code_CFOP”) === 1409) ? column(“value1_st”):0;

workflow attached
Help__workflow.knwf (197.2 KB)

After that, I will studie takkb code. (all this blow my mind with possibilities (I need to learn java snippet :sweat_smile:)

1 Like

Hello @Felipereis50
Sorry for my late answer but, I’m currently too busy, and no spare time for community challenges.

I had a previously unpublished challenge solution for the following topic; however by the time it was ready, there was already a couple functional workflows by @takbb and @iCFO , so I just keep it for myself at that time…

I knew since I read your post, that I could deploy that solution for your challenge. However that post was a very particular (and complex) logistics use case; so just share that solution with you wouldn’t be that evident.

I didn’t try to understand your workflow, I just concerned myself on replacing MA functionality. I’ve replicated your current custom workflow by: removing the Group Loop(), and replacing the ‘Moving Aggregation’ nodes with Column Expressions’ supported metanodes.

I hope it helps on time; and I think that you will be surprised with the alternative workflow’s performance.

BR

1 Like

Hi Friend

First, thanks for your support.
No problem you’re busy. I know it is difficult these days.

"I saw that you created a metanode and in the last node, with the final cumulative calculation. But I’ll be honest: I didn’t understand your last formula. It was a blend of offset with creativity. I spent two hours, out of curiosity, trying to understand the logic and I swear I couldn’t get to the values (not even using Excel). I tried breaking down the formula into steps and didn’t get the result.

Anyway, in my last post I managed to do it using Switch>case, and it was much easier.

It seems to me that of all the Posts about the “cumulative challenges”, using the Java Snippet as takk always use, is very good (too bad I don’t know it very well yet)."

For example, in the row index 9, how you get the value 18, debugging the formula? Jus for curiosity.

if(column("inventory_sorter") == 0){
    column("qt_inventory_new") 
} else {
    column("initial_value_new") +
        column("Sum(qt_inventory_new)") - 
            column("Sum(qt_inventory_new)", 
                - (column("ROWINDEX", -1) - column("ROWINDEX_shift", -1) + 1)) }

1 Like

Hello @Felipereis50
I’m glad to hear that you managed to get through, with a more simple solution using Switch > case. However your latest solution still do have a loop.

If I were in your shoes, I would do the same; avoiding those solutions that I cannot afford for a full comprehensive deploy. This is the reason I try to avoid of using Java.

I’ve deployed a full workflow, not using Java nor loops as requested in topic, by using KNIME base nodes; and delivering the expected output in the test dataset. Understand and implement this alternative solution or not, it is in your roof now.

Then what do you mean by ‘debugging’? is there a ‘bug’?

BR

P.S.- I learnt KNIME, Py and R aiming to transgress all Excel limitations. I’m free now.

Thank you for the support, my friend.
One day I will be able to help others.

About the “debugging” word, what I wanted to explain what for you to explain the logic of your code. (the last code).
Because I have never seen the offset be using like you did.

Regarding the word ‘debugging,’ what I wanted to convey was for you to explain the logic of your code (the last code). I have never seen the offset being used like you did."

2 Likes

Hello @Felipereis50 , I see now

Let me explain the the basics; instead of looking at the formula code lets see the columns meaning: As you already reviewed I’m mapping the column reference position with ROWINDEX and ROWINDEX_shift being both a vertical cell coordinates column system.

ROWINDEX is the continuous counting of rows.
Meanwhile ROWINDEX_shift is the ROWINDEX position of the initial considered range cells.

You have a dedicated target column that is initial range value + line item delta:
I am using a continuous Moving Aggregation for $qt_inventory_new$, this is $Sum(qt_inventory_new)$; for any range the cumulative targeted result is the incremental value ‘ROWINDEX’ cumulative result vs the last value of the previous range ‘ROWINDEX_shift - 1’.

For your question about result == 18 in ROWINDEX’s position 9:
column($Sum(qt_inventory_new)$, ‘ROWINDEX value’) == 25 [ - ]
column($Sum(qt_inventory_new)$, ‘ROWINDEX_shift value’ -1) == 7

25 - 7 = 18

A simplified coding of it would be:

column("Sum(qt_inventory_new)",  column("ROWINDEX")) - 
    column("Sum(qt_inventory_new)", column("ROWINDEX_shift") - 1)

However you cannot code in this way, because ROWINDEX (-1) doesn’t exist; as ROWINDEX starts in (0). This applies for the first target range.

Then the calculation becomes:
$initial_value_new$ (that is extended over the range) == 23 [ + ]
column(“Sum(qt_inventory_new)”) == 25 [ - ]
column(“Sum(qt_inventory_new)”, coordinate = ROWINDEX -[9-4] ) == 30

23 + 25 - 30 = 18

Following this review, there was an overcomplex heritage. The reviewed code remain as follows (removing the if/else statement):

column("initial_value_new") +
    column("Sum(qt_inventory_new)") - 
        column("Sum(qt_inventory_new)", 
            - (column("ROWINDEX") - column("ROWINDEX_shift"))) 

The workflow in HUB has been updated right now, so you can proceed to download updated version. Thanks to this review now it’s being optimized.

I hope it helps you to deploy your own solution.

BR

2 Likes

Hi Gordon
Sorry to be late!

Woww…
Very interesting and creative.
Now I understood.

I’ll never imagine something like that.

Thanks for your explanation and your time.

1 Like

Hello @Felipereis50,

goood job! Didn’t go into details but I’m pretty sure you can only use one Column Expressions node (and also exclude loop) but this way is probably easier to understand and maintain so if it doesn’t take to long I would stick with it.

Regarding additional Column Expressions node that calculates offset you don’t need it as there is column() function which you can use. It accepts offset as second argument so you can access previous and next values from current row. Give it a try!

Br,
Ivan

1 Like

Hello @ipazin

This column()'s offset argument is what we have been discussing in previous posts.

BR

1 Like

Hello @gonhaddock,

Didn’t read in detail. Sry :slight_smile:

Br,
Ivan

1 Like

No worries @ipazin ,
I talked about shifting and coordinates :sweat_smile: ; the truth name in system is ‘offset’

I should deploy a formal component for this Column Expression’s MA functionality.

BR

Hi Ivan

I would like share what I did with your help:

formula
switch (rowIndex()) 
{
 case 1:
        var_TotalInitial=column("Vlr_Total_Saldo");
        var_qtd_prev_row = 0
        column("Vlr_Total_Saldo");    
 break;
 default:
    var calc0=column("@TotalSTRetidoItemFinal")+column("@TotalSTDesta+OPItemFinal")
    var calc1=(column("@CFOP_Ficha3") === "1409") ? calc0:0; // site link "Ternary expressions"
    var_qtd_prev_row = column("qt_saldo_aggreg",-1)
    
    if (column("@CFOP_Ficha3") == "1409") 
    {var_TotalInitial = var_TotalInitial + calc1}
    
    else if (column("@CFOP_Ficha3") == "1411")
    {var_TotalInitial =  isNaN(roundHalfUp((var_TotalInitial / var_qtd_prev_row) * column("qt_saldo_aggreg"),2)) ? 0: (roundHalfUp((var_TotalInitial /  var_qtd_prev_row) * column("qt_saldo_aggreg"),2)) }
    
       
    else if (column("@CFOP_Ficha3") == "5927")
    {var_TotalInitial = roundHalfUp((var_TotalInitial / var_qtd_prev_row) * column("qt_saldo_aggreg"),2) }
    
    else if (column("@CFOP_Ficha3") == "5405")
    {var_TotalInitial = roundHalfUp((var_TotalInitial / var_qtd_prev_row) * column("qt_saldo_aggreg"),2) }
    
    else if (column("@CFOP_Ficha3") == "5119")
    {var_TotalInitial = roundHalfUp((var_TotalInitial / var_qtd_prev_row) * column("qt_saldo_aggreg"),2) }    
    
    else if (column("@CFOP_Ficha3") == "5409")
    {var_TotalInitial = roundHalfUp((var_TotalInitial / var_qtd_prev_row) * column("qt_saldo_aggreg"),2) }    
    
    else if (column("@CFOP_Ficha3") == "5209")
    {var_TotalInitial = roundHalfUp((var_TotalInitial / var_qtd_prev_row) * column("qt_saldo_aggreg"),2) }  


    
    //vTotalInitial = vTotalInitial + rule1    
     //https://stackoverflow.com/questions/31971801/setting-a-javascript-variable-with-an-if-statement-should-the-var-x-be-in              
                    
}

See that I used the switch case as you taught me (formula above)

Regarding the offset column, I incorporated it into the calculation with a variable (var_qtd_prev_row). And with that, within the if or else if, I included the rule inside.
It was a way to learn as well.

Now, I would like to migrate this entire formula to the Java Snippet and see if our friend, takk, could help me.
In other words, I want to replicate exactly the result I achieved in the “column expressions” in the Java Snippet so that I can understand how it works.
Yesterday, I tried to do it myself, but I couldn’t include the variables.

Actually, right from the start, with the row index, the rule is not the same.
For example, in the column expression, the rule for the row index is simple, just type rowindex().
But it seems to me that the Java Snippet is totally different.

Later, I will open a new topic, set up the flow, and notify takk to see if he can simulate it for me and recreate it in Java. It will be a way for me to learn and it becomes easier when it’s a subject I’m familiar with.
In another topic, @ArjenEX helped me with a question I asked about whether it’s possible to fill two columns at the same time using the same variable as output, and in Java Snippet it’s possible. I found it very interesting. But learning Java is complex, there are different concepts.

But anyway, what you taught me about the switch changed everything for me. I will use it a lot.
There must be so much more to learn with “column expressions” still.

1 Like

Hello @Felipereis50,

I just shared links but you did all the work so congrats. Anyways glad it helped.

Br,
Ivan

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.