One with port call records that show when a vessel arrives and departs a port.
Another with activity records that have timestamps for events that occur while the vessel is at the port.
My goal:
For each port call (the time interval between an arrival and the following departure), I want to create a combined timeline.
This timeline should show the arrival record first, then all the activity records that occurred during that interval (based on their timestamp), and finally the departure record.
In other words:
I want to join the two spreadsheets so that every port visit has its start and end time along with all the activities that happened in between.
At this point, I’m stuck on the final part of my workflow. I’ve tried several approaches—some of which are still visible in my workflow—but I still haven’t achieved the desired result.
I attempted to lag three columns: Port Name, Date and Time, and Arrival. However, I discovered that the Arrival value was being taken from the previous trip rather than the current trip. As a result, the row would display an Arrival time from a different port than the corresponding Departure.
I realized that I should lag the Departure column instead of the Arrival, so that the Arrival appears first on the same row as its matching Departure. Unfortunately, I couldn’t figure out how to do that because the Rule-based Row Filter proved too complicated to use.
After using the Cross Joiner, I ended up with 200,000 rows. Although this was acceptable (since I planned to use the Rule-based Row Filter afterwards), I couldn’t get that filter to work properly, and I didn’t get the results I was looking for.
Thank you for your interest in helping! The two bottom sections are essentially the same, which is why I joined them. If you’d like to assist, the key part to focus on is from the Sorter (Activity Spreadsheet) node and the column filter Arrival and Departure node.
I’ve made some improvements to the workflow and managed to get the Lag Column working correctly. However, I’m now running into an issue with the Cross Joiner—it produces too much data, multiplying every row several times, even after applying the Rule-based Row Filter.
Right now I am trying to use a Rule Engine or Rule-base Row filter node and put it an expression similar as this one. $All Fast Local$ >= $Arrival Time$ AND $All Fast Local$ <= $Departure Time$ => TRUE
TRUE => FALSE
This is my goal with this expression:
Checks if an activity (All Fast Local) falls inside a port visit time frame (Arrival Time to Departure Time).
@Adrien_2 , in addition to the “joiner” components mentioned by @mlauber71 , I also have a couple of components that allow you to simply treat the KNIME table directly as an H2 table (with some limitations regarding data types).
An example of joining using these is in this workflow. It allows you to simply access the data directly as H2 database tables with support for all the regular db nodes.
If you do try that and find they don’t work for your data, let me know and I’ll see if anything needs small modifications.