However, the counting rule is not just a straight count of the number of rows. I am required to count the number of spills based on the 12/24 counting method, the rules of which are described below:
Start counting when the first discharge occurs.
Any discharge (or discharges) in the first 12-hour block are counted as one spill.
Any discharge (or discharges) in the next, and subsequent 24-hour blocks, are each counted as one additional spill per block.
Continue counting until there’s a 24-hour block with no discharge.
For the next discharge after the 24-hour block with no discharge, you begin again with the 12-hour and 24-hour block spill counting sequence.
For example, the following discharges would give a count of 3 (as the first 2 discharges are within 12 hours of each other and the next two are within the next 24 hour block. There is a gap of more than 24 hours between the 16/10/24 and 06/12/24 discharges so the spill count would start again for the 06/12/24 discharge)
Hi @fostc80857 , to make it easy for people to help you, make sure the table format is similar between the one you posted here and the one you uploaded. Here, it’s assumed that the time difference is calculated between rows, whilst in the workflow, you gave two columns instead, possibly suggesting the difference to be calculated between columns.
Secondly, maybe you might want to state how precise you want the time difference to be, since the data has minutes, whilst your rule is by hourly count. Let’s say your starting point is January 20th, 2000 at 9:01AM, and ending point is January 20th, 2000 at 21:00 PM, which is just 1 minute short to 12 hours, would that be considered a spill?
Lastly, I assume you’re using the Date&Time Difference Node, perhaps give your potential helpers an insight as to the timezone format that you use.
The basis of the query is the same in that I am trying to count spills based on the 12/24 method. However, there is a difference in that the sample data in the previous post detailed each data point (i.e. each 15 minute monitor reading), whereas for this post, I have already identified the start and end dates of each of the discharges. So I now only have to look at distinct start and end times rather than having to go through many more records.
Re your first point - when I posted the table of 5 records, I was trying to give readers an example of how the 12/24 principal would be applied to a section of the uploaded data. So, apologies if it has caused any confusion. On further thought, as I am only after a count of discharges and not a duration, the Discharge End column may not be needed? There would just need to be a calculation of the time difference between the start dates (i.e between rows) to see if the next discharges starts within the 12 hour or subsequent 24 hour window.
This is where I am struggling because I do not know how the looping would work to compare the start dates to see if they are within the 21 or 24 time windows and then reset the count after a full 24 hours without a discharge. I included the Discharge End dates just incase a suggested solution required them.
Re your second point - the time differences would need to be calculated to the second.
Expanding on your example of January 20th, 2000 say a spill occurred at 9:01:43AM the count would start (1 spill). The 12 hour window would then end at 21:01:43PM and if another spill occurred in the next 24 hour window after that it would be counted as a new spill (total 2 spills). Each spill in subsequent 24 hour windows would count as another spill until there was a complete 24 hour window without a spill. At this point the 12/24 cycle for counting would start again. NOTE: it is only the 124/24 cycle process that would start again, the spill count does not go back to zero.
Re your third point - The time is formatted to dd/MM/yyyy HH:mm:ss
I got all 17 discharge counts, each count is labelled by the means in which they are detected. Does this table make sense to you?
If it does not, kindly explain why so I can make modifications.
But if it does, kindly share a more extended table (or a number of different tables) so we can test the logic on a larger amount of samples to oversee potential issues, before I can share the solution.
Thanks @fostc80857 , while I look at this, can you please attach a second column to annotate which row should be counted? (preferably do it for both test data)
I have re-uploaded the two files and have added a flag column. Please bear in mind that I am trying to re-calculate someone else’s work (as I am an internal auditor) so I have added the flags where I think they should be.
As per your updated tables, even though our tables matched perfectly for the first dataset, I could not replicate your second dataset flag column using the same logic that I did previously. This means that my solution won’t work for you.
I might be able to solve it if I have more time to dig deeper. But it will eat quite some time off me, and I’m currently preoccupied with my own projects. Sorry could not help you further, but I’m sure you’ll find help from others here.
Update:
So… following what you stated in all of the above, here’s how I summarize things up according to my understanding.
Only column to refer to is the starting timestamp. The ending column is ignored.
Calculations need to be done in the granularity of seconds, meaning 12 hours = 43,200 sec while 24 hours = 86,400.
All rows are divided into 24 hour blocks. Within each block, the first 12 hours are counted as one. Every row after the 12 hour threshold but within the same block is counted as one each.
If this is true, the table count should look like this:
Very different to your flag column, but it holds true to your rules. Unless I’m misunderstanding this, this should be the correct representation.
I think you might want to recheck your counting system, and how your rules should be visually represented as a table. Without clarity in visualizing your rules in a tabular form, I’m afraid it’s gonna be a wild goose chase.
Based on the example, I think my initial position of only using the start date and time is incorrect. As you can see from the example diagram if a single spill event straddles the 12 hour block (or even a 24 hour block for that matter), then it should be counted as 2 spills, therefore I think the end date and time needs to be considered as well.
To answer your points above:
Only column to refer to is the starting timestamp. The ending column is ignored. - No, based on the example diagram the end timestamp will need to be considered.
Calculations need to be done in the granularity of seconds, meaning 12 hours = 43,200 sec while 24 hours = 86,400. Yes, this granularity will allow you to identify the 12 and 24 hour blocks.
All rows are divided into 24 hour blocks. Within each block, the first 12 hours are counted as one. Every row after the 12 hour threshold but within the same block is counted as one each. No - The 1st block after the 1st discharge is a 12 hour block, each block after that would be a 24 hour block until there is a full 24 hour block without a spill. The next spill after a full 24 hour block without a spill will start your 12 and 24 hour block sequence again. So in the diagram above, if there was another 24 hour block to the right which did not have a spill, the 12/24 hour block process would start again.
I appreciate it is a very convoluted way of counting the spills but regulation requires we count them this way. Apologies if I have caused any confusion.