Loop and aggregate values until a certain condition is met

kowisoft · August 13, 2024, 5:43am

Dear KNIMErs,

I can’t wrap my head around how to solve this little problem:

I have the following table:

What I basically want to do is to loop through these line items and stop, when a certain condition is met.

The condition is this

IF item supplier X’s total value exceeds 100,000

I considered a loop and doing a Variable Condition Loop End Node, following the approach that @rs1 described in this workflow on the Community Hub:

However, I can’t get the inner part of the loop to be working. I already played around with the GroupBy Node as well as the Moving Aggregator Node but they do not help me inside the loop (or I use them the wrong way, which is probably more realistic )

I have provided a (pretty basic) example workflow where I also have a Double Widget Node to interactively set the threshold value (the 100,000 from the IF statement above).

Can anyone nudge me in the right direction?

ajit555 · August 13, 2024, 7:19am

Thanks for posting your workflow with sample data. It helps.

Do you want to just filter your table without aggregating it?
Assuming you want to just filter the table with the condition “IF item supplier X’s total value exceeds 100,000” the steps are:
- Filter the table for “item supplier = X” (You can use a widget to switch the supplier name)
- Create a column for cumulative sum of value
- Filter the table for the sum value threshold (You can use a widget to change sum value threshold)
- Once the able is filtered, you can derive additional statistics like list / count / frequency / min / max of item names etc.

kowisoft · August 13, 2024, 8:43am

Thank you for your response @ajit555

What I basically want to do is to assign the “values” to X until their threshold (100,000 in this example) is met, then assign the rest of the “values” to the other supplier, Y in this case. Line items details however should be maintained.

ajit555 · August 13, 2024, 10:40am

Could you please paste the desired output table?

Based on your above feedback, I am understanding that from Row0 till Row4, the “item supplier” column would be “X” (as the cumumulative sum value till this row is 100,000) and for rest of the rows, it would be “Y”.

Also, the above item supplier column would be a new one or the existing column needs to be updated?

mlauber71 · August 13, 2024, 10:50am

@kowisoft without having looked deeply into it you will have to collect what is happening and store the intermediate values. I constructed this example but there is also the recursive loop that can somehow do that automatically but I am not that comfortable with.

Question is do you want some sort of optimization where you would try to come as close to the 100k as possible with your values.

MartinDDDD · August 13, 2024, 11:44am

Is the objective to find the item either before or after this threshold for Supplier X is met?

Do you want to apply this to any of the suppliers of items in the table or do you just want to find “the first” that exceeds the threshold?

Depending on this a solution may look a little different.

I’ve created a prototype with four scenarios:

finding all items for each supplier once threshold is exceeded
finding all items for each supplier before threshold is exceeded
finding the exact item for each supplier that crosses over the threshold.

Workflow:
Loop aggregate based on condition_MartinDDDD.knwf (118.5 KB)

Overview:

ajit555 · August 13, 2024, 4:30pm

I was facing a similar allocation problem and posted here

Please let me know if it is similar to yours?

kowisoft · August 14, 2024, 8:52am

Thank you for providing a link to that post @ajit555

If I get it right, it means that the input ports of the Recursive Loop Node in that example do the following:

1. Input Port: takes the ‘current batch’ aka the current supplier that is assigned to the project (sub)list
1. Input Port: takes the remaining projects, where no supplier is yet assigned
1. Input Port: takes the list of the remaining suppliers, who have not yet been assigned to a project

Is that correct?

Also big THANK YOU to @mlauber71 for reminding me that I have asked a similar question some time ago. Sorry for ‘double posting’ (kind of), I knew I had this issue before in a pretty similar use case but didn’t scroll ‘far enough’ to find that old post of mine.

MartinDDDD · August 14, 2024, 10:23am

I take you are referring to the example provided in the other topic - let me try and explain:

always passes through the projects table - on first iteration in full, in any subsequent iteration it is passing through “what is left over” (bottom port of 6)
Always passes through the suppliers table - on first iteration in full, in any subsequent iteration only the remaining suppliers (bottom port of 7)
Collection output port - collects those projects enriched with an allocated supplier, where the cumultative value was below the threshold (top port of 6)
recursion port for projects table - passes those projects that came after the cumultative threshold was met back to the beginning of the loop for the next iteration. Important: Table columns need to match the initial input 1:1 - that’s why the cumultative sum column gets removed before passing the table back
recursion port for suppliers table - passes through the remaining suppliers to be processed (bottom port of 7)

Hope that makes sense…

kowisoft · August 14, 2024, 11:39am

Thank you for your explanation @MartinDDDD - this made it a lot clearer.

I have provided my ‘final’ dummy workflow below:

Some caveats in my use case that required a (slightly) different approach:

I had some ‘required’ vendors / suppliers in my list, e. g. for products that can only be sourced from those suppliers. Therefore I was using a Row Splitter to split up the original table before I start the Recursive Loop. The basic idea is to separate these ‘already assigned’ line items to a separate branch and merge them back together after the loop
Because of these ‘required’ suppliers I also deducted the ‘already assigned’ volume from these required line items from the threshold value (driven by the Double Widget Node) like so → see screenshot below. Important: I only do this for Supplier 1 (S1) because I only have 2 suppliers → see also next bullet point
I only have two suppliers (in this example and in my real world use case). As a result of this I limited the Maximum Number of Iterations to 1. This basically says…

Assign line items up to the (adjusted) volume to S1, the rest stays unassigned

Because I only have 2 suppliers, I also made sure that I have 2 collector ports (compared to 2 recursion ports) for the Recursive Loop End Node so I get the items that were assigned S1 at output port 1 and the ‘remainder’ at output port 2

Then it’s just a question of conditional logic (Rule Engine Node) and a little bit of concatenation to bring it all back together

At the end, I have just a simple GroupBy (not needed in my real world use case) to test if the workflow delivers the results as expected.

Thanks a lot for all your help in this thread @MartinDDDD , @ajit555 and @mlauber71 - I tested this now with all kinds of different values and it works and seems to be robust enough.

The one thing that remains for me now is to apply this to my real world use case, where I work more with variables

system · November 12, 2024, 11:40am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.