Joiner within a specific time frame

Hello everyone,
I have two datasets that I’m going to join and do some data modification on it. I want to join them based on two factors;

  1. order id ( super easy)
  2. time interval ( this is where I’m struggling)
    I basically want the time stamp of results coming from table two to be after the time stamp of table 1. We have multiple entries with the same order ID within each data set that will get linked incorrectly if I don’t do this.
    Once I do this, I want to write a rule engine node to check and see if the transaction happened within two hours (is the time stamp of table 2 within two hours of the time stamp of table 1?). Is that even possible?
    FYI, I have a date/time column on each table as a string.

@amirmbhd welcome to the KNIME forum.

You might want to check out this thread and the sample workflow with ranges and the H2 database

Maybe you could provide us with a sample workflow that would demonstrate your whole pronblem so we might search for a genuine solution.

Concerning your date and time variables are in a string format you could check out this blood with examples how to convert them and work with them

Another thing you might try would be to convert your date/time variables into numbers and just see if they are in-between them. If your task is complicated you might have to revert to loops for each individual ID - but maybe a join setting with a BETWEEN statement might be enough.

3 Likes

Hey, @mlauber71 Thank you so much for your reply. It looks like the easiest way would be to convert my date/time variables into numbers and see if they are in between them ( H2 looks so complicated to me as a new KNIME user lol). I don’t see any node to convert date/time to number. I tried to convert string to number, but nothing comes out of it. String to date/time works, but I don’t know how to convert it to number? I would greatly appreciate it if you could please help me with that.

Hi @amirmbhd,

The simplest form of comparable “number” to work with when you have a date/time is to convert it to a string in the form
yyyyMMddHHmmss

This could then be converted to a number or simply left as a string. When converted this way they can be sorted and compared. If you already have date/time columns stored as Strings, it is possible that they are already in a format that can be compared “numerically” without any further action being needed.

Part of your problem involves calculating a 2 hour window with respect to an existing date/time so it sounds like for that you will actually want it as a date/time value (rather than a string or number) anyway and then using that as the basis for determining the 2 hour gaps.

If you have a simple workflow with some sample data (or if you are struggling on the workflow, just the sample data will do), I’m sure somebody here can assist with further pointers and suggestions.

Nothing in what you have written so far sounds unachievable with KNIME, but if you can upload example data it will be easier to fully understand what you need to do. :wink:

4 Likes

And also… How much data do you have (I mean how many rows?) as that could have an influence on a solution.

2 Likes

@takbb and @mlauber71 Thank you both so much for your replies. I tried converting it to number and then use the math formula node to define ranges, but it did not work!! Please find below two sample dummy tables that resemble my dataset. ( I expect at least 200 rows in each table)

Table 1
Row id Order_number transaction time Quantity
1A 397002841 04/15/2021 07:29:33 10
2A 397002841 04/15/2021 14:35:28 100
3A 397002841 04/15/2021 23:43:27 80
4A 396675251 04/15/2021 07:14:28 500
5A 396675251 04/15/2021 13:03:00 20
6A 397547485 04/15/2021 15:33:20 20
Table 2
row id ORDER_ID PERFORMED_DATE Quantity
1B 397002841 04-15-21 9:45 10
2B 397002841 04-15-21 14:38 100
3B 397002841 04-15-21 23:58 70
4B 396675251 04-15-21 8:37 500
5B 396675251 04-15-21 13:27 20
6B 397547485 4/15/21 15:38 20

As you can see, the only factor linking the two tables is order number/ID, but there are several transactions with the same order number/ID. I want to only join the rows that the time frame makes sense to it- my first point of confusion. for example, I want to only link 2A and 2B, but using the inner join on order number/ID would link 2A to 1A,2B, 3B.
Once I do this, I want to flag any transaction that is meeting one of these factors;

  1. quantity is not equal ( It’s simple with math formula node and rule engine node)
  2. performed time ( table two) should be within two hours of transaction time ( table one) - This is my second point of confusion

so my ideal result would look like this;
1A linked to 1B → flagged for time (more than 2 hours)
2A linked to 2B → not flagged
3A linked to 3B → flagged for quantity
4A linked to 4B → not flagged
5A linked to 5B → not flagged
6A linked to 6B → not flagged

I apologize in advance for my long reply. I would greatly appreciate any help!

2 Likes

Hi @amirmbhd,
based on your example following steps could do it

  1. Create a Date column for both tables
  2. Join table1 to table2 with keys order_id, date and quantity with left outer join setting. (Solves problem 1, because no match)
  3. Use date&time difference node to calculate the difference between transaction and performed timestamp.

BR

3 Likes

Hi @amirmbhd

That just what I was working on. The solution in text form @morpheus is the way to go.
Here is a workflow Joiner within a specific time frame.knwf (64.0 KB)
Btw I first thought it was a more complex question. That is why I introduced the cross joiner in the wf. With the Group Loop you can check all possible combinations within one order.

gr. Hans

2 Likes

Hi @amirmbhd, some alternatives that don’t use the left outer joining, but instead make some assumptions about the sequencing of your data, as there appeared to be no guarantee that we could match on date, and needed to have some idea of how data in one row compared with data in the other (other than just the order number)

So there were a couple of question marks I had over whether your data rows were always in the exact same sequence, so I did a variety of flows to demonstrate different approaches. The very first flow I did was assuming you actually wanted to filter out the rows that weren’t matching within the time window, and demonstrates what you might do in that situation, but the subsequent flows give back the data that you want, and provide ideas on what to do if the ordering of rows isn’t quite the same, in increasing order of complexity.

I also noted that your date/times weren’t always consistent. Not sure if that was just typos, but the flows handle that too

Joiner within a specific time frame.knwf (85.7 KB)

The one at the end purposely shuffles the input tables , then ranks them in ascending date order by order number before then comparing based on order number and rank. The other ideas are pretty much as @morpheus has proposed. I use a Date&Time shift instead of a date&time difference, but the principle of what it’s doing is the same.

(as you can see from the interest this has sparked, there are many varieties of solution from different people, and hopefully each gives some useful insights and tips that can assist you., Oh and don’t apologize for writing a long post… We’d always rather have clear requirements and a good explanation, as it helps us find a solution! :slight_smile: )

2 Likes

Hi @amirmbhd,
based on your example, you can run into identifying problem.
How do you differentiate records having eqaual quantity within same order_id? Which performed_date would then relate to which transaction time?

Does the row id in table a always represents the row_id in table b (the row value is the same and differs only by the extention e.g. row 1 vs, row extention A vs. B ) or does your records also contains a unique order position for each order_id which you doesn’t take into account?
Having a unique identification make things much easier!

BR

1 Like

Hey @morpheus @HansS @takbb Thank you all so much for your help! It’s working like a charm rn! I ended up calculating the duration and filtered it by positive values to get rid of any transaction that happened before the correlating order id! Then I used the rule engine node to flag anything outside the two-hour window that I was looking for!
My workflow still needs some modifications that I might end up posting another question about it, but I am so happy with the current outcome! I really appreciate your help and support to this KNIME newbie, lol. Thank you for making my day. It means a lot.

2 Likes

@morpheus btw that’s luckily not a problem in this workflow. All the quantities should be the same on both tables, and if they are not, it means someone is stealing or doing something wrong! so that was an easy fix! Thanks for your consideration.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.