Loop for Filtering and Combining Two Tables with Different Values

carmelavivant · August 11, 2023, 3:55am

hi guys. Need your help and expertise please. So I’m creating a workflow that would compare two different tables with different number of rows.

Table 1:
1000
1000
1000
1500
1500

Table 2:
1000
1000
1500

The goal is to match the common items and remove them from the list, then generate a table for the unmatched ones.
I thought of creating a loop to filter per value (1,000 then 1,500) for both tables but I got an error as soon as I got to the column appender.

Without the loop, this is what I created and it’s working correctly. When both table have been appended, I created a rule engine to match the values. But with multiple duplicate values, I’m stuck.
Would greatly appreciate your help. Thanks in advance.

carmelavivant · August 11, 2023, 5:19am

Hi @marzukim thanks for taking the time. I tried this but it would not work.
I’m looking for something that would filter out each value per table.

Example,
I want to pull data with 1,000 on table 1
Simultaneously, I will also pull all 1,000 on table 2,
Then i will join both and do this on a loop for every other value.

Thanks.

marzukim · August 11, 2023, 6:42am

i apologize again for misunderstanding the real context. hope other members/experts will be able to assist you soon.

rgds

takbb · August 11, 2023, 8:23am

Hi @carmelavivant ,

Using your description, I hope that the current solution could be something you can work with. I’ve assumed that the column data is Integer, so you’d have to adjust this if they are actually strings

When comparing two lists using loops, typically you loop through one list, and compare with the other, so there is only a single loop, so this differs a little to your description but I believe is a simpler way to achieve the result.

I have used your sample data, but added some additional rows for different scenarios.

What I have done here is to use a Group Loop which takes each of the values in turn. The second table is filtered on each iteration to only inspect those values. By using a grouping, a new ROWID is used to create a key on each of the sub-tables (the partial tables for this iteration) from each table. The join is then on ROWID which means that different numbers of rows in each table will match/not match and this can be picked up using the three outputs on the Joiner.

Those outputs are then passed to the different ports on the loop end node, so they are collected separately.

Finally, the one set of rows not covered by the loop are those which appear only in table2. These are detected by a separate joiner, and concatenated with one of the outputs from the loop end.

I have then sent to results to Excel, but these could be sent to whatever further processing nodes you require.
Row by row comparison.knwf (41.3 KB)

takbb · August 11, 2023, 8:33am

@marzukim , well done for suggesting a solution.

You will find here that many different attempts at a solution can get posted and some work partially, or miss some subtlety of the problem. Other times we discover there are various ways of solving the problem depending on personal preference, or even just by thinking about it in different ways.

Many times posting a “wrong” solution sparks other ideas on attacking the problem from a different angle, or further refines our understanding, so there really is no need to apologize for having a go, or removing a solution that didn’t quite work out. I see you already have two confirmed solutions so keep going!

marzukim · August 11, 2023, 9:19am

thank you for your kind feedback, @takbb

i agree with you, that sometime posting a wrong solution can be helpful in sparking other ideas and sometimes also leading to the outside-the-box solutions. however, i believe that my current solution is biased due to my limited understanding of the real problem and my imperfect english understanding. i appreciate for your feedback, as i’m always looking to improve my understanding and my ability to provide helpful solutions to others.

marzukim · August 11, 2023, 2:55pm

I know this is not the solution to the problems that have been raised, but I thought it might be interested of exploring in the unmatched filtered values from two tables.

rgds

carmelavivant · August 18, 2023, 10:24am

hi @takbb Brian,
Loop Problem.knwf (86.4 KB)
Thanks so much for this. I’ve tried it but not quite getting the results yet.

For reference, I’m looping two different sets of data based on their amount variables- match the amounts on two tables then concatenate, so I can compare.

I’m attaching here the flow for your reference.
Appreciate you taking the time to help. Thanks!

takbb · August 18, 2023, 3:21pm

Hi @carmelavivant , in my original workflow, it was assuming that there was only one column for the matching process, and would need a little adjustment to make it work.

The workflow that you have (with a nested loop) isn’t going to work, or at least I cannot see what the logic would be to make it work. It would, incidentally, require an additional loop end because it currently has to loop starts and only one loop end, although that won’t directly solve your problem. To me this is a problem that would require only a single-loop.

However, I took a step back and have realised that you don’t actually need any loops!

In essence, your problem is to compare two tables and identify:
rows that are in both
rows that are in Table1
rows that are in Table2.

That basically describes the function of the Joiner node, but there is one small complication which makes the Joiner node unsuitable by itself; this being that if a row is duplicated in one of the tables, we only want one to join, and the other must be considered an “extra” row that is only present in one table.

This was why I added a loop in my previous example, so that it could compare groups of rows for each key and only match these on a row-by-row basis, marking any “additional” rows in one table as “extra”.

But this doesn’t require a loop to achieve, we simply have to add an additional sequence to the “key” , grouped within the key columns. This is the job for the “Rank” node.

The Rank node can be told to group by the key columns, and rank rows with a sequence number within key groupings. The node does require an additional column other than those which defines the groups, so for the purposes of your sample data, I have had to add an additional “dummy” column.

I’m still using KNIME 4.7.x, so my nodes will appear slightly different to your KNIME 5.1 workflow, but this takes each of your sample Table Creators and determines the differences:

The Rank nodes are configured like this:

The joiner then joins using your key columns plus “rank”

and it is up to you whether you want to split the different outputs across the 3 joiner output ports, or just use the top port for further processing.

The rank and dummy columns can be excluded from output as they have done their job

forum 71643 - Loop Problem - alternative.knwf (46.8 KB)

I hope that helps

carmelavivant · August 22, 2023, 2:50am

Oh wow thanks so much @takbb Brian!
This worked excellently! Appreciate your help!

takbb · August 23, 2023, 1:48pm

You’re welcome @carmelavivant , and thank you for marking my answer as the solution.

system · August 30, 2023, 1:49pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.