Distance between timestamps in same column but different ID

quintin1009 · December 23, 2024, 6:34pm

Hi all,

I am facing a challenge currently. I tried the forum below but I need to make one extra advanced step.

Scandata has:

one column with date&time with minutes and seconds
each row identifies as a scan of an object.
the object therefore is maybe >100+ scanned per day/week/etc.
actual data set is 1.000.000 rows of scans (is group loop than an option?)
If no new timestamp for object ID then current date&time (duration difference)

Example data:
Time distance same column different ID.knwf (72.1 KB)

→ My goal is to identify the time difference from the same object No. which need to be added as a seperate column and connected with the rest of the columns.

(Not possible when I have more timestamps in between)

Thank you!!

rfeigel · December 24, 2024, 2:25am

I’m not entirely clear about what you want. Take a look at this. It flags a scan when there was no previous one. The only tricky thing was your timestamps weren’t all the same. One was missing seconds so the String to DateTime node had to be adjusted accordingly. Also the timestamps aren’t sorted. I used UNIX timestamps to calculate the difference in seconds between scans.
Time distance same column different ID.knwf (118.0 KB)

Output

quintin1009 · December 24, 2024, 7:39am

Hi @rfeigel,

Thanks for noting the timestamp difference!

I think this is indeed the way forward, but with 20.000 unique ID’s it takes quite some time to be processed via the Group loop, do I need to split the data in multiple Group loops? & I want to change the ‘No previous scan’ with the difference between timestamp VS current day&time.

quintin1009 · December 24, 2024, 10:06am

Could it be simplified in column expressions without the group loop start?

Thought:

Sort ID + Sort date&time
Generate Time difference between previous ROW & ID
IF ID = same as ID (previous row) => time difference between row
AND IF ID = not same as ID (previous or next) row => “No previous scan”
Extra question: Can we define “no previous scan” as the time difference as of today. That would give me the estimation that it is not changed for a long time.

I think this will help for this enormous dataset.

quintin1009 · December 24, 2024, 2:26pm

Hi @rfeigel,
I made 4 tables
2x lag (previous) columns ID & Time
2x Lead (next) columns ID & Time

if(column(“object”) && column(“ID_Lead”)) {
0
}
else if (column(“object”) == column(“ID_Lag”)) {
column(“Time_Diff_Lag”)
}
else if (column(“object”) != column(“ID_Lead”)) {
column(“Time_Diff_Lead”)
} else {
column(“Current_Time”)
}

It is mentioning that the expression is incorrect because of the 0 as integer and can’t calculate the duration

rfeigel · December 25, 2024, 12:23am

I’m sorry but I have no clue about what you’re doing. If you want to share the entire workflow, I’ll take a look. In the meantime, try this. I think it does what you want but still uses a loop. I’ll give some thought about eliminating the loop although I don’t think its straightforward.
Time distance same column different ID rev 1.knwf (153.1 KB)
Here’s a workflow without the loop.
Time distance same column different ID rev 2.knwf (141.1 KB)

rfeigel · December 27, 2024, 4:57am

Did my last “loopless” workflow meet your needs?

quintin1009 · December 27, 2024, 9:02am

Hi @rfeigel ,

I made a combo with my own touch, start loop wasn’t an option with such a large dataset… Please see the end result in workflow attached.

In the coming days I will validate it bit further.
Time distance same column.knwf (119.7 KB)

quintin1009 · December 27, 2024, 9:45am

I made two adjustments

‘Duration from time now’ in date&time difference node was not set correctly. Needed to be base column (date) and previous column (current date).
Therefore no negative numbers in time difference results, and deleted the unused nodes at the end.