Distance between timestamps in same column but different ID

Hi all,

I am facing a challenge currently. I tried the forum below but I need to make one extra advanced step.

Scandata has:

  1. one column with date&time with minutes and seconds
  2. each row identifies as a scan of an object.
  3. the object therefore is maybe >100+ scanned per day/week/etc.
  4. actual data set is 1.000.000 rows of scans (is group loop than an option?)
  5. If no new timestamp for object ID then current date&time (duration difference)

Example data:
Time distance same column different ID.knwf (72.1 KB)

→ My goal is to identify the time difference from the same object No. which need to be added as a seperate column and connected with the rest of the columns.

(Not possible when I have more timestamps in between)

Thank you!!

I’m not entirely clear about what you want. Take a look at this. It flags a scan when there was no previous one. The only tricky thing was your timestamps weren’t all the same. One was missing seconds so the String to DateTime node had to be adjusted accordingly. Also the timestamps aren’t sorted. I used UNIX timestamps to calculate the difference in seconds between scans.
Time distance same column different ID.knwf (118.0 KB)



Output

1 Like

Hi @rfeigel,

Thanks for noting the timestamp difference!

I think this is indeed the way forward, but with 20.000 unique ID’s it takes quite some time to be processed via the Group loop, do I need to split the data in multiple Group loops? & I want to change the ‘No previous scan’ with the difference between timestamp VS current day&time.

Could it be simplified in column expressions without the group loop start?

Thought:

  1. Sort ID + Sort date&time
  2. Generate Time difference between previous ROW & ID
  3. IF ID = same as ID (previous row) => time difference between row
    AND IF ID = not same as ID (previous or next) row => “No previous scan”
  4. Extra question: Can we define “no previous scan” as the time difference as of today. That would give me the estimation that it is not changed for a long time.

I think this will help for this enormous dataset.

Hi @rfeigel,
I made 4 tables
2x lag (previous) columns ID & Time
2x Lead (next) columns ID & Time

if(column(“object”) && column(“ID_Lead”)) {
0
}
else if (column(“object”) == column(“ID_Lag”)) {
column(“Time_Diff_Lag”)
}
else if (column(“object”) != column(“ID_Lead”)) {
column(“Time_Diff_Lead”)
} else {
column(“Current_Time”)
}

It is mentioning that the expression is incorrect because of the 0 as integer and can’t calculate the duration

I’m sorry but I have no clue about what you’re doing. If you want to share the entire workflow, I’ll take a look. In the meantime, try this. I think it does what you want but still uses a loop. I’ll give some thought about eliminating the loop although I don’t think its straightforward.
Time distance same column different ID rev 1.knwf (153.1 KB)
Here’s a workflow without the loop.
Time distance same column different ID rev 2.knwf (141.1 KB)

2 Likes

Did my last “loopless” workflow meet your needs?

1 Like

Hi @rfeigel ,

I made a combo with my own touch, start loop wasn’t an option with such a large dataset… Please see the end result in workflow attached.

In the coming days I will validate it bit further.
Time distance same column.knwf (119.7 KB)

I made two adjustments

  1. ‘Duration from time now’ in date&time difference node was not set correctly. Needed to be base column (date) and previous column (current date).
  2. Therefore no negative numbers in time difference results, and deleted the unused nodes at the end.