Adjusting H times to HH times

I have a data set where I am given the times in order but in 12 hour format how do I convert these 24 hour format when normal string to date&time node reads them as 24hr

55127 31/07/2020 11:32
55126 01/08/2020 02:12
55125 01/08/2020 02:48
55124 01/08/2020 06:24
55123 01/08/2020 06:45
55122 01/08/2020 08:46
55121 01/08/2020 08:53
55120 01/08/2020 08:56
55119 01/08/2020 09:22
55118 01/08/2020 11:02
55117 01/08/2020 11:06
55116 01/08/2020 11:49
55115 01/08/2020 12:49
55114 01/08/2020 12:50
55113 01/08/2020 12:57
55112 01/08/2020 01:10
55111 01/08/2020 01:15
55110 01/08/2020 01:25
55109 01/08/2020 01:36

Hi @iankruger, so just to confirm, your times do not have am/pm on them and you can only determine if it’s pm, by it being the same date but an earlier time to a prior row? Is my understanding correct?

If that is the case, can we assume that there will always be at least one time given that is in the morning for a particular day?

Also, what does the number in the first column represent, as I notice that is descending, but your dates/times are in ascending order? Is that correct?

2 Likes

Hi @takbb yes you are correct, also the first column is the record number with the latest record being the most recent event

Hi @iankruger, interesting challenge! :wink:

My thoughts on this were that we needed to find the transitions points between am and pm, and mark those as requiring either 0 or 12 hours to be added to the retrieved datetime.

In between those transition points, all the 0 values then need to be filled-down, and likewise all the 12 values.

After that, we then shift each row by the required number of hours.

That was the plan! So, first off, we just treat everything as “morning” data. Using a Column Aggregator (or we could have used String Manipulation for example) to combine the Date and Time columns into a Date and Time string as "01/08/2020 02:12". A String to Date&Time using a date format mask of d/M/yyyy k:mm. Note the use of “k” here rather than “h”. Not obvious from the documentation the difference between them, but “k” works here, and “h” doesn’t… I think h requires additional am/pm information but it might be something else- I clearly need to go read up!

We use a Lag Column to note the derived “morning” Date&Time for the previous row, then for each row use Date&Time Difference (in minutes) to see if time has gone backwards from the previous record. Where time goes backwards… this is the “pm” boundary. So use a Rule engine to mark that boundary as requiring as 12 hour forward timeshift.

Next, use another Lag Column to see the DATE of the previous record. Where the Date changes, this is the “am” boundary. So use a Rule engine to mark that boundary as requiring a 0 hour forward timeshift.

(I use the term “am” boundary rather loosely here. The am/pm boundary is actually at 12:00 noon, but for the purpose of this, we can treat the boundary as at 1pm, because that is the point at which we need to adjust the time from 01:00 to 13:00)

Anyway, at this point we now have data something like the following:


This has correctly identified the timeshift required for Row1 and Row15, but Row0 is problematic as none of these rules successfully identify that 11:32 is in the evening instead of the morning which is the default position.

Unfortunately I haven’t time right now to do the workflow for determining the hour of the first record.

Ignoring the first record, a Missing Values node can then “fill down” the missing timeshifts for the rows in between the boundaries…

and these timeshifts can then be applied.

As I say, what I have at the moment isn’t quite complete. I thought though, that I’d give you what I’ve got so far, and either I’ll come back later, or somebody else can chip in (or suggest a totally different way).

My thoughts for determining if first row is am or pm is that you filter off first row, and all the other rows separately as a “side flow”. You look at the date of the first row, and filter the “other” rows to include only rows with the same date. You then find the minimum time for those other rows with the same date, and if there is a time earlier than the time on the first row, then the first row is “am”. If there isn’t (or if there aren’t any other rows with the same date as the first row) then the first row is “pm”…

Anyway, this is my story so far…
12 hour to 24 hour times.knwf (30.7 KB)

As you can see, this currently determines the correct time of day for all but the first row.

image

I’d also need to do further tests and maybe some refinement to see if it gets this correct under a variety of circumstances, but hopefully this is a starting point!

5 Likes

This is so straightforward and fast. I did not in the end need the first day and filtered those out. Many thanks

2 Likes

@iankruger, That’s good. Glad it works for you. As long as the first record you are interested in is in the morning, I think it should be OK as it stands. If for some reason you find it doesn’t quite work for you then post back with details and I’ll look into it further.

1 Like

one slight complication which became apparent once applying your solution. The time does not set to 00:00 on start of new day but carries on to 12:59…

54551 06/08/2020 10:14
54550 07/08/2020 12:01
54549 07/08/2020 12:27
54548 07/08/2020 01:14
54547 07/08/2020 02:58

Hi @iankruger , wow that was a bit of an oversight on my part and given me something of a headache!

You do realise that there will come a time when you’ll wish you could fix the data at source so it doesn’t contain information loss don’t you?! :wink:

Anyway, so we need to infer some additional meaning in the data. Attached contains quite a bit of additional complexity.

So far, the rules were this:
if the datetime shifts backwards, then it means we have gone from morning to afternoon and it added 12 to the “01:00” etc to make it “13;00” and so on, but that was clearly wrong as I didn’t allow for it shifting from 12 midnight to 1am, which should become “00:00” and “01:00”

So it needs to first determine if the the 12 is a midnight or a noon 12.

To do this, it looks at all the other rows for the same date and sees if there is an earlier hour (i.e. 01 to 11) for the same date on an earlier row. If there is, then it infers that this is 12 noon, since that doesn’t happen for a midnight 12. For the records with a 12 hour that are NOT noon, we need to shift the hour by -12. Next, if the time appears to go backwards (e.g. a shift from 12 to 1, it then has to look to see if the 12 was a noon 12, in which case we add 12 hours to to 01:00. If it wasn’t we leave the 01:00 etc as a timeshift of 0. We then fill down missing values as before.

I think this works. Its not bullet proof as if your data is missing large chunks of hours it can still infer incorrectly.

e.g. in my sample test data I now have:

image

There is no way to infer the meaning of the 10:14 on 06/08/2020. For that matter neither can I be sure that the times on the 07/08/2020 rows are morning rather than afternoon, but it is assumed that the hours be relatively contiguous.

12 hour to 24 hour times V2.knwf (54.6 KB)

This is the results from my test data:
image

I hope this works, and helps, but my strong recommendation here would be to fix your data at source unless that is absolutely out of the question. Unless this is just a one off this is storing up trouble for the future. Inferring meaning from missing data like this is never going to guarantee to be correct in all circumstances.

4 Likes

I did a spot check and it is working great many thanks. I was working on an approach using a Recursive loop which will be much slower no doubt. I will post it if I can get it to work. But now I can get on with the analysis in time for the deadline yesterday. This is what you get with a vendor that builds a data source without consulting the analyst… they will have to rebuild that.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.