Custom Date Binning

Hi All!

I have 2 tables: one contains an array of dates, the other contains events with timestamps. My goal is to create a new column next to the timestamps showing which of the dates in the array the timestamp is closest to.

Table 1 (Custom Date Bins)


Table 2 (Source Data)

10/2/18 12:00
10/3/18 01:10
10/8/18 14:00

~Desired Table~

+Time_Stamp+ +Date_Bin+
10/2/18 12:10 10/1/18
10/3/18 01:10 10/1/18
10/8/18 14:00 10/7/18

The contents in the reference table can change, so I want to find the date each timestamp is closest to to help with custom grouping later down the line. The table with the timestamps can be huge. Thoughts? Thanks again for your time!

I’m not sure this will run fast enough for you, given that your source data table is huge, but here’s one solution (and it might give you ideas for a more optimized solution):
Use a loop to process one row of the source table at a time (use a Chunk Loop Start node with chunk size = 1).
In the loop,
Cross join to the date bins table.
Calculate the (absolute) difference between the timestamp and date bin for each row in the cross joined table.
Sort the table on the absolute diff you just calculated.
Use the Row Filter node to keep just the first row in the sorted table (i.e. the one with the lowest difference).
End the loop with a Loop End node.

The workflow looks something like this:

I noticed the Cross Joiner took the most time to execute, so I tried to optimize by setting the Chunk size to be equal to the number of date bins.
On my laptop it took about 30 sec. to process 10K rows.

Hope this helps,
custom date binning.knwf (25.8 KB)

1 Like

Hi Don,

Thanks for the reply! I like your solution better than mine, but haven’t had a chance to battle test it yet. My short-term solution is pretty horrible, but involves using a loop to create rows for every date, not just the chunk starts. While it works, it is extremely bloated. I’ll work at implementing your solution shortly!