How to Join Two Tables Using a Condition with a BETWEEN clause

Let’s consider we have two tables (already loaded from a DB so DB nodes are not relevent here) to join. Table A contains an integer column X. Table B contains two integer columns Y and Z. We’re supposed to join rows from table A with rows from table B so that X value from table A is between values Y and Z of table B.

An equivalent SQL select would be as follows:
select * from A, B where A.X between B.Y and B.Z;

Any idea how to achieve this?

Thanks.

Hi Jan,

from my experience, there are two solutions to this problem, depending on data volume:
The easy (but slow for large tables B) solution would be:

Inside a wrapped metanode with streaming enabled, cross join both tables, and use a rule-based row filter to filter for B.Y <= A.X AND A.X <= B.Z.

The more complex but fast solution builds on this pattern:
Unpivot table B to have Y and Z in one column, name this X. Concatenate tables A and B. Sort by X.

Then do some magic with Missing Value, Moving Aggregation, etc. (haven’t thought it through until the end for your case, but in my workshops people usually achieve their goal based on this hint).

Best regards

Arne

4 Likes

Hi @arbe,

Thanks a lot for your very valuable hint.

Well, I’ve already discovered the first solution you mentioned. It was sufficient in a situation I had to join about 1K-2K row tables either.

The second solution seems smart! I’ve already tried it and it seemed giving the results I expected. What surprised me a bit, I didn’t employee the Moving Aggregation node. I didn’t see any purpose of it. More over, looking at the unpivot-concatenate-sort sequence’s result, I hesitated which value should I set the Window length parameter to. I didn’t see any periodicity in how the rows from both tables were combined after each other.
But my crucial question is, doe’s the second solution work even if there are rows in the left joining table that join no rows from the right table? I configured the Missing Values Node to use “Next Value” option for the boundaries columns from the right table which produced rows containing a value from the left table joined with a value from the right it definitely shouldn’t have joined.

I’m attaching my attempt.
table_join_between_clause.knwf (29.7 KB)

Regards,
Jan

Hi @jan_lender,

Yes, this solution should cover your scenario. It would need to look a bit different if there is an overlap in the ranges of table B, I believe. But it can exclude those entries in A without a range match in B.

Moving Aggregation was only a guess, but you are right, it’s not needed. The missing hint here was to add a second sort criteria for the range’s lower and upper bound markers. Then you can use Missing Value’s previous row feature.

Solution attached.
table_join_between_clause2.knwf (33.4 KB)

Best regards

Arne

Maybe not the most creative idea but could you just load the two tables into a local SQL database like H2 (if it is in memory it might be fast) and just use the SQL code? And then download the results?

1 Like

Hello @jan_lender,

in case you only need to add one column Binner (Dictionary) node can be used.

Additionally there is existing ticket to handle these kind of manipulations and and I have added +1 on it. (Internal reference: AP-10692)

Or a Component :slight_smile:

Br,
Ivan

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.