Joining tables based on Time

  • I have two spreadsheets:
    • One with port call records that show when a vessel arrives and departs a port.
    • Another with activity records that have timestamps for events that occur while the vessel is at the port.
  • My goal:
    • For each port call (the time interval between an arrival and the following departure), I want to create a combined timeline.
    • This timeline should show the arrival record first, then all the activity records that occurred during that interval (based on their timestamp), and finally the departure record.
  • In other words:
    • I want to join the two spreadsheets so that every port visit has its start and end time along with all the activities that happened in between.

At this point, I’m stuck on the final part of my workflow. I’ve tried several approaches—some of which are still visible in my workflow—but I still haven’t achieved the desired result.

  • I attempted to lag three columns: Port Name, Date and Time, and Arrival. However, I discovered that the Arrival value was being taken from the previous trip rather than the current trip. As a result, the row would display an Arrival time from a different port than the corresponding Departure.
  • I realized that I should lag the Departure column instead of the Arrival, so that the Arrival appears first on the same row as its matching Departure. Unfortunately, I couldn’t figure out how to do that because the Rule-based Row Filter proved too complicated to use.
  • After using the Cross Joiner, I ended up with 200,000 rows. Although this was acceptable (since I planned to use the Rule-based Row Filter afterwards), I couldn’t get that filter to work properly, and I didn’t get the results I was looking for.

Knime Forum 1.knwf (815.8 KB)

The workflow you uploaded has three spreadsheets, not two. Could you explain what each contains?

Thank you for your interest in helping! The two bottom sections are essentially the same, which is why I joined them. If you’d like to assist, the key part to focus on is from the Sorter (Activity Spreadsheet) node and the column filter Arrival and Departure node.

I’ve made some improvements to the workflow and managed to get the Lag Column working correctly. However, I’m now running into an issue with the Cross Joiner—it produces too much data, multiplying every row several times, even after applying the Rule-based Row Filter.

Knime Forum 1.knwf (818.6 KB)

Right now I am trying to use a Rule Engine or Rule-base Row filter node and put it an expression similar as this one. $All Fast Local$ >= $Arrival Time$ AND $All Fast Local$ <= $Departure Time$ => TRUE
TRUE => FALSE
This is my goal with this expression:

  1. Checks if an activity (All Fast Local) falls inside a port visit time frame (Arrival Time to Departure Time).
  2. If true, it keeps the row (TRUE).
  3. If false, it removes the row (FALSE)

@Adrien_2 you can try and employ H2 local database to do joins based on time variables (BETWEEN)

Also @takbb has components that allow for individual join conditions you could explore:

2 Likes

thanks @mlauber71 …

@Adrien_2 , in addition to the “joiner” components mentioned by @mlauber71 , I also have a couple of components that allow you to simply treat the KNIME table directly as an H2 table (with some limitations regarding data types).

An example of joining using these is in this workflow. It allows you to simply access the data directly as H2 database tables with support for all the regular db nodes.

If you do try that and find they don’t work for your data, let me know and I’ll see if anything needs small modifications.

3 Likes