I have two tables table_a and table_b the first table has a time span data in stored as two columns start_date and end_date the second table has time point data. I want to join those table in such a way that only those rows match where the time point lies within the time span.
My current way if a two step. (1) do a cross join, than (2) apply a row filter.
This is super inefficient. I have something like 300’000 rows in one table and 4*365 (4 years each having 365 days) in the other one. The resulting table should be something like 500’000 rows but with this approach the table first becomes 438 million rows long before getting filtered down again.
I tried looping over the time pionts but this still takes considerable amount of time. Using a database - which would implement this quite efficiently - unfortunatly is not an option.
Any suggestions, I suppose this is a very common pattern and most of the time will result in the same problem?
Well, not a particularly sophisticated idea but you could try and use a local H2 database to join two datasets by time ranges. Depending on the power of your machine and other factors that might work. I tried it with files of different sizes and at least it gives you a result.
@petermeissner well that is the beauty of it, it is a database but works just as a standalone file, even in memory. You have your own database integrated into KNIME without the need for further installations or tools. I would recommend checking it out.