Hello!
I need to process four tables using a loop.
The gist:
I want to feed four columns (I can identify a common pattern in the names) into each loop iteration, manipulate them, and get a result. But I haven’t found a way to feed four columns into the loop at once within a single data stream.
I’m thinking of splitting the four tables into different streams and creating a separate loop within each stream, combining the results and performing the manipulations. But there’s another issue here: it’s unclear how to track the iterations in each stream, as they must be synchronized to achieve the expected result.
Essentially, I need to get the average value for each column from these four tables (the average between each first column, the average between each and the second, and so on).
I already have a mechanism implemented without loops, but it requires 30+ nodes. I’d like to simplify the mechanism and make it more automated.
It’s certainly great that you pointed me to a topic completely unrelated to my question. And that would be appropriate, but I couldn’t find a single mention of such a problem.
Do you have anything substantive to say, or should I just ignore it?
Sample input and/or output data, if at all possible. If your data is business confidential, please consider creating a small mockup dataset to allow community members to recreate the problem
but tldr:
A) counting loop, filtering by index
B) column list loop
C) unpivoted joins
D) Matrix Multiplication Masks
plenty more options but given you dont provide example data nor a scope, no one can give you a proper feedback because some solutions are bad choices for large datasets.
and given you didnt provide what you created so far, no one knows what improvement to recommend either.
So, I’ve collected what I have and anonymized the data.
I have four tables, and I need to find the average value for each column with the same name.
I did it the simple way. I’m not satisfied with the number of steps required. To simplify the process and make it easier to control, I want to do the same thing using a loop.
Normally, I would use the “Column List Loop Start” node, but it only allows one column per iteration.
I had an idea to create a loop for each table and combine the iterations using join. But in this case, there’s no guarantee that the iterations for each table will be synchronous, and at some point I won’t get a comparison for the first tables in one iteration step, and for one of them in another.
I need help either with a way to combine four columns in a single loop, or with a counter/method for monitoring the iterations for the four tables and a loop for each of them.
Hi @fe145f9fb2a1f6b I thought your suggestion was a very good one to simplify the workflow but I couldn’t see how it would perform the necessary groupings (i.e. COL_MEAN($$CURRENT_COLUMN$$) would just get the mean of the entire column across all rows, instead of “per цццц (#1)” ), but I think it would work using the GroupBy node instead, or else by putting the Math Formula (multi) into a group loop, but I figured that would be slightly counter-productive.
I adapted the idea to use the GroupBy node instead. Specifying “type based aggregation” and have it find the mean of each double column, to save specifying each column individually.
If these ideas work for you please mark @fe145f9fb2a1f6b 's response as the solution, since mine is simply the same solution with a different node.
This is better and much faster than trying to use any loops for this, but of course there may be other more complex use cases that require loops, but how it would be tackled would depend on the specific requirements.
In theory you could also use the Row Aggregator to do this, but it only allows specification of one category (grouping) column, which means you would lose any non-aggregated columns from the result, e.g. цццц. In this case, since цццц is also numeric you could also include that in the aggregation and it would work, but that may not always be the case.
Hello.
You really have a very elegant solution to my problem. I hadn’t even thought of it, although now it seems the most logical. In this solution, I imagined something similar to what I added in the screenshot.
Using the “Math formula (Multi column)” node, the solution fails because the result doesn’t meet the requirements. Do you really think fe145f9fb2a1f6b still has the correct solution? In my opinion, his approach is correct, but the result doesn’t meet the desired one.
Hi @ElTsSikier , I think the approach given was along the right lines but for the reason I mentioned, the Math Formula (multi column) node cannot give the required result because it doesn’t do the required grouping. That was a simple oversight which we all have when theorising. I found the issue by trying it out and that’s why I adapted it with the GroupBy node instead because I could see that the idea was essentially good.
I think you need to decide for yourself based on which solution (if any) gives the expected/required result.