(Cross) Join and then Rule-Based Filter

Hello!

I wonder if KNIME can provide a more effective way to do SQL-style join → filter. The joining with non-unique field can create a huge cartesian product table. I wish KNIME can keep this table virtual and do the rule-based filter and joining together to avoid saving a ridiculously huge table. Thanks!

Hi @henryhmo , you can actually do SQL-style join, if you load your table in an H2 virtual db for example, in Knime that is. Knime also support SQLite virtual db. They’re loaded in memory (you lose them if you reset the node) and you can then query the tables via SQL joins.

1 Like

Hi @henryhmo , here’s a quick demo to illustrate what I wrote.

Scenario: We have a loopup table of Cities and we want to retrieve the cities whose name starts with a given list from a table Requests.

Solutions:

  1. Typically, using Knime nodes, since it is not an exact match, we cannot do a join. Instead, we would:
    i. Do a cross join between the 2 tables;
    ii. Apply a rule-based filter;

  2. Alternatively, we can load the tables in a virtual DB (in memory), for example H2, and do SQL-style JOIN

This is how it would look like using H2 virtually:
image

My Cities table:
image

Requests table:
image

When using the H2 Connector, just make sure you use In-memory if you want a virtual db:
image

After writing these 2 tables to H2, I can then query them by doing an SQL join like this:

SELECT *
FROM "PUBLIC"."requests" r
JOIN "PUBLIC"."cities" c ON c."Cities" LIKE CONCAT(r."starts_with", '%')

And I get this result:
image

Did not need to do cross joiner, which indeed can give you a huge Cartesian product table, depending on the size of the tables.

Here’s the demo workflow:

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.