Joiner node in Loop

Hi all!

I’m trying to use Joiner node in a loop. I start loop with a Table Row To Variable Loop Start which takes table name as input. That table is read from one database and then compared to the same table in another database. In order to do this I’m using Joiner node for Right excluding join. In order for this to work properly I would have to make Joiner node to take all columns from both tables (they have same names) as joining columns.

Ideas?

Br,
Ivan

You could probably use Python and Pandas somehow.
Another (probably computationally inefficient) solution would be to Unpivot both tables (select all columns as Value Columns). Aftewards you can join on the remaining columns (which are always RowId, ColumnNames, ColumnValues) with a Full Outer Join. In case you find any missing values in the resulting table, you know that the tables are not equal.

Not sure it will work and a bit complicated but tnx.

Maybe extract column dimension and via flow variable send it to Joiner node together with column names for each iteration…

In Joiner Node under Flow Variables tab under LeftTableJoinPredicate there is array-size parameter. Does anybody knows what is that parameter and how to use it?
Ivan

1 Like

I built a workflow that does that also to play around with SQLite and to see what it could do.
I have two databases that both have two tables named “table_01” and “table_02”. They have the same structure Col1 … but two cells are different than the others. Task is now to isolate these cells and leave the ones out that are identical

I extract the structure from the table and build an reference string for a JOIN operation

This string can now be used to bring the two tables together after the ‘right’ table has been transferred temporarily to the first/left database.

Since SQLIte does not support RIGHT OUTER JOINS I used an inner JOIN and excluded the IDs that would be identical in all columns so the remaining ones are the ones we were looking for.

CREATE TABLE $${Stable_not_identical}$$ AS
_SELECT * FROM _
_temp_table t3
_WHERE t3.new_id NOT IN _
_( SELECT DISTINCT _
_t2.new_id _
_FROM _temp_table t2 _
INNER JOIN $${Sname}$$ t1
ON $${Scombined string}$$
)

The table with the non-identical rows from the right table is then stored in the first database (and might be transferred to a final location).

Maybe it is a bit of an overkill and your problem might have been solved in the meantime but it gave me the opportunity to demonstrate a few features of KNIME with Databases. One might switch out the SQLite with a bigger DB or might manipulate the construction of the SQL join code or something else.

Here are the left over non identical lines. If there is already an ID there one might drop the artificial one.

36

kn_example_right_joiner_loop_sqlite.knwf (151.5 KB)

Hi mlauber,

nice job and tnx but it is a bit of an overkill for me and not sure if it would work. For the time being I will not be joining in a loop rather I will have a separate flow.

I think the Joiner node should have this possibility.

Br,
Ivan

1 Like

I don’t know if that might help. But i just red an article in the KNIME Blog about KNIME’s test suite:

https://www.knime.com/blog/enter-the-era-of-automated-workflow-testing-and-validation

There is a node that compares two tables to see if they are equal or not. Would that be useful for you?

2 Likes

Hi!

Not really but other things learned from that blog sure helps :smiley:

Tnx!
Ivan

I have the same question to use a joiner in a loop to join several sheets of excel.
Each sheet has different columns but they all share a key column.
I didn’t find any solution to join them all in a loop without reading them one by one.
Can anyone help with that?

1 Like

Hello there,

to answer my own topic from a while ago. This is now possible with array flow variables which were introduced with KNIME version 4.1. Who is interested check out example on KNIME Hub. Doesn’t feature a loop but you’ll get a point :wink:

Br,
Ivan

3 Likes

Thanks for sharing Ivan. Highly appreciated! BR

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.