flagging events in time-series based on logical condition

Hi everyone,

I have the following task at hand.

I have a dataset consisting of the following columns: obs_index, datetime, date, time, value.

What I want to achieve: I need to ‘event flag’ (see GOAL column in the below screenshots) based on the value and duration conditions, using only KNIME nodes (not R or Python snippets) if possible.

The logic needed to be implemented is as follows: flag the event [if value < 70 & duration >= 120 mins continuously], stop flagging [if value >=70 & duration >= 15mins continuously], otherwise keep flagging.

I have prepared two scenarios for your information. Please see the below screenshots.

Scenario 1:

Scenario 2:

Tried method: I tried to solve the problem using the lag column, rule engine, and moving aggregation nodes (with window = 3, [mean(value)] backward window type method). Then put the condition of flagging the “end of event” row(s) wherever the value was increasing above >= 70 AND at the same time the mean(value) for the last three observations were >= 70. However, this didn’t work for all the possible >=120 observations. The workflow that was unsuccessful for me:
image

Let me know if you have any questions or need any clarification.
Your help is greatly appreciated! :slight_smile:

Thank you!
Odko

Hi @Odko -

Assuming the data is not confidential, can you provide actual data samples instead of screenshots? You are much more likely to get quick assistance that way :slight_smile:

(Otherwise, thanks for the detailed post including both your desired outcome and what you’ve tried so far. Not everyone includes all of that!)

3 Likes

Thank you for your message @ScottF

Here is the dataset.
dataset.csv (102.2 KB)

Cheers,
Odko

Hi @Odko

Pease see the below for a possible solution.

I have first removed the duplicates in the timestamp column so that the index works correctly (there were no unique values in the index column & a few duplicated observations with no wait time in between)

Then, I noticed that each row corresponds to roughly 5mins measurement interval, with 2 exceptions (there were 2 measurement breaks of a few hours , index 433 & 1779. )

Once the dataset was clean and the duration corresponding to 1 row was rounded into 5 minutes, I could set up the event flags.
According to your requirements below, for each row we’d need to consider ±23 rows (corresponding to 120 minute interval) & +/- 2 rows (corresponding to 15 minutes interval).

(The formulas in the rule engine nodes are quite long but the creation of those could also likely be automated using column list loops, string manipulation & flow variables, but I wanted to keep the workflow as smiple as possible. )

That way, I could compute the correct “bounds” for the observations, where 1s are seen in rows where the next 24 rows (inclusive of the current row) observe glucose levels below 70. 1s will also be seen in rows where the last 24 rows (incl. current row) had glucose levels below 70.

That way, we have 1s where the flagged event begins & 1s where the last “bad glucose level” was observed.
I have then also set the criteria for the 15 minute intervals, where we’d have 0s where the current row+ last 2 rows where >=70 or where the current row + next 2 rows where >=70.

This results in having the event rows “bound” between 1s where the 120min event starts 0s where the flagging stops. You’ll have missing values in between.

image

Finally, we can now simply fill in the missing values using the “previous row” approach.

Attaching my workflow so you can examine in detail : Before you re-execute though make sure the timestamp column in the csv reader is formatted as local date time (I had issues with re executing when the workflow was imported to a different machine)


glucose measurement.knwf (73.5 KB)

Hope this helps!
Any flaws in the logic laid out above please let me know!
Adam

5 Likes

Hi @Add94

This is really amazing. Thank you so much for your time and thorough work!
Your solution absolutely solved my problem! :+1: :white_check_mark:

I’ve been using KNIME for quite a long period of time for data processing and I must admit that your help was instrumental in rediscovering some functionalities of the nodes, e.g. ‘repeat previous value’ in missing value node, and the existence of node “Lead” :slight_smile:

Thank you heaps!
Odko

2 Likes

Nice solution @add94! Also good to see a critical component from @RiznyLafi being put to good use. Kudos all around! :+1:

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.