Join by date range

I have
table_1 (Date) - contiguous list of dates.
table_2 (Start_Date, End_Date, …) - records with date ranges.
In SQL I can do
select * from table_1
join table_2
on table_1.Date between table_2.Start_Date and table_2.End_Date

How to do this in KNIME?

1 Like

The first solution that came to my mind:
The Cross Joiner node and then the Rule-based Row Filter node with this expression:
$Date$ >= $Start_Date$ AND $Date$ <= $End_Date$ => TRUE

If cross joining is not a good approach in your case then let me know and I will try to come with a better idea.

:blush:

Thank you, Armin. This is what I thought to do but 1mln record cross join 180 is not sounds as good idea. In general this is one of the KNIME restriction.

Hi @izaychik63


I don’t know if my solution will be the answer to your problem (otherwise it may help someone else to find a better/faster solution).
I tested it with 1 mio date-ranges an 180 individual dates and it took about 15 minutes on my machine.

Join by date range.knwf (18.4 KB)

Hope this helps.

gr
Hans

4 Likes

Thank you, Hans. This is not what I assumed. The outcome is date field with all the date ranges corresponding to it. Say if I have range 1/1/19 - 1/5/19 then it expects to be 5 records
1/1/19, 1/1/19, 1/5/19
1/2/19, 1/1/19, 1/5/19

1/5/19, 1/1/19, 1/5/19
Anyway, it is interesting solution.

I think the solution which @HansS has provided is the solution to what you asked.

Whichever table has 180 records goes to the loop start node.

Please provide sample data files. For example an Excel file with 3 sheets. The first two as the sample of the tables and the third one as your desired output.
Then everything will be much easier.

:blush:

1 Like

Armin, I ran the WF from Hans and got 180 lines. My expectation to have more than 1mln rows.

Hi there,

A mix solution between @armingrudd and @HansS would be first filter based on range and then cos-join?

Anyway you can use streaming to speed things up.

Also there is no node from some extension like SQL Join node where you can write SQL and join tables?

Br,
Ivan

2 Likes

Hans, I took a second look on your WF. It looks very fast and effective.

4 Likes

Hi all,

I would definitely recommend using Streaming in this case and not the loop. It will be tremendously faster. Actually the loop does nothing else. It also checkes the full cross join but does have the overhead of the loop and the flow variables.
We made the Cross Joiner streamable based on a very similar request just a year ago.

I just tried it, this made me really curious. I actually tested it with our nightly which will come with very nice performance improvements.
And I got the following times
Streaming: 105 Seconds
Looping: 260 Seconds

@HansS in your loop I noticed you are using the constant value column node to add the flow variable. You can get the same using the Variable to Table Column node.

Hope this helps :slight_smile:
Cheers, Iris

4 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.