Sewage Spill Counter


I have a high level project idea, that I suspect is possible to do in Knime, however, I am not sure how it could be accomplished.

For reporting purposes we are required to count the number of sewage spills that are recorded through flow monitors recording sewage flows at various sites. Counting needs to be done using two separate methods. One is a straight count of spills the other is using a 12/24 method.

1. Straight count of spills
The workflow would need to go down the rows for each SITE column and count each block of data when the value goes over 100 (e.g. the first time it goes over 100 and then drops back to 100 or below would count as 1. The second time it did this would count as 2 and so on until the last row of the column). Missing data values would need to be included in the count block as we could not prove the spill level dropped back to 100 or below.

Once the spills for one column have been counted, it would loop to the next column and repeat the process.

2. 12/24 count method
Spill counts also need to be reported using the following method:

  1. Start counting when the first spill occurs (> 100 as above).
  2. Any spill (or spills) in the first 12-hour block are counted as one spill (so could go above and below 100 several times within the 12 hours but would only be counted as 1).
  3. Any spill (or spills) in the next, and subsequent 24-hour blocks, are each counted as one additional spill per block.
  4. Continue counting until there’s a 24-hour block with no discharge.

For the next discharge after the 24-hour block with no discharge, you begin again with the 12-hour and 24-hour block spill counting sequence.

Once the spills for one column have been counted, it would loop to the next column and repeat the process.

Output file
The output file would need to be a data table in the following format.

Site1 x a
Site2 y b
Site3 z c

Example Spill Data.xlsx (1.0 MB)
Would this be possible? Looking for some possible solution examples or a high level proposal of how it could be done.

As an added extra, it would also be good if the spill duration could be calculated using the date & time column (i.e. sum the time duration while the level is above 100 for each spill at each site). :slight_smile:

Thanks in advance.

potential useful nodes i can think of is Groupby, Rule Engine, Extract Datetime, Lag, Moving Aggregation, Column Expressions. Maybe they can help you begin developping your flow


hi @fostc80857
this is my approach to ‘spill counting’ using knime nodes based on your “straight count of spills methodology”. i’m not sure whether the output is correct, so could you please verify it with your previous report.
KNIME_spill count-MM-2.knwf (1.5 MB)
note: adding rules for last row if spills occurs

on 2nd part, i’m a bit confused about the 12/24 method, so I hope others can provide their optimal solution to that.

happy knimining.


Hi @marzukim

Many thanks for this workflow.

Apologies for the slightly delayed response but I have been carrying out various tests on it. I initially increased the number of sites to ten and all was ok. So I then tested it on all of our 1,761 monitored sites and I can confirm that the workflow has returned the correct count of spills for ALL of the sites, so thank you very much for taking the time to produce this solution for me, it is very much appreciated.

I agree the 12/24 count method is complex, so hopefully someone else might be able to have a go at resolving it but in the meantime I will continue to try to resolve myself :slight_smile: To try to simplify the way the counting works under this method:

  1. Start counting when the first spill happens.
  2. In the first 12 hours, consider all spills as one spill.
  3. In the following 24-hour periods, each spill counts as one additional spill.
  4. Keep counting until there’s a 24-hour period without any spills.
  5. When the next spill occurs after a 24-hour block without a spill, start the counting sequence again

Hopefully this is a more simple explanation of the 12/24 counting method.


hi @fostc80857
i’m glad to hear that you were able to use it and thank you for verifying the results.
for the second option, the 12/24 method, i found this graphical representation online to give me an understanding of the method more,

could you please verify whether my attached workflow result is correct?
KNIME_spills-count-MM-3.knwf (166.4 KB)
using a lot of iteration and it’s a bit slow on my old computer. the 12/24 workflow results are a combination of the 12 and 24 blocks counting.

sorry, If this is not what you were expecting.


Thanks @marzukim

I will test this over the next few days and verify whether your workflow gives the expected results.


Hi @marzukim

Unfortunately KNIME_spills-count-MM-3.knwf (166.4 KB) is not matching up with the results of an Excel based macro calculation. Example results are as follows:

SITE1: KNIME = 40 (Excel = 41)
SITE4: KNIME = 24 (Excel = 24)
07D10DELVESCSO07 SewerLevel: KNIME = 55 (Excel = 58)

I know you used the chart / graph in the chat above but I think the second triangle from the left is incorrect and should not be a spill count. The level dropped below the threshold and did not go back above it until the third triangle from the left.

Not sure if that impacts your workflow or not?


hi fostc80857
possible that i’m employing an incorrect approach or method for the calculation here.
the division of the rows block in my recursive calculation may be causing an inaccurate counting of the spill.

i’ve attached the revised workflow with dummy data for your review, which includes minor amendments.
KNIME_1224-MM.knwf (117.8 KB)
Spill Data3.xlsx (27.9 KB)

regarding the site results output, are the discrepancies occurring across all sites or just specific sites?
hopefully, other members will contribute their diiff ideas and approach for the 12/24 spill calculation strategy.

linux knime 5.1.x

Hi @marzukim

Thanks for the updated workflow. I will re-test and compare to the Excel macro outputs.

In terms of the discrepancies, of the 4/5 that I tested some matched and some did not - I could not see anything specific about the sites that did not match.

I will run the workflow on a larger dataset to see if any themes can be identified.


This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.