Lag Column - based on two or more conditions

Barajas · February 25, 2021, 2:12am

KNIME people,

I have a dataset (image above) where I need to be able to determine if there is an overlapping of events based on multiple conditions. What I currently do to obtain my results is manually selecting all items of only one “company”, then selecting all occurrences of one same “short description” and after applying those two filters, sorting by “sys_created_on” from oldest to newest. With this first part, I have everything set up to start marking those lines that overlap (see the example marked in green).

Correct exercise: (marked in green) since the company is always “Public Storage” AND Short description is always “Host::PSA026647_PM_0015::Down” the second occurrence started at 02:21 while the first occurrence ended at 03:08, there is overlapping. This is what I need to mark automatically.

Incorrect exercise: (marked in yellow) since the second occurrence happened at 00:39 while the previous occurrence was at 04:36 but “company” are different and “short description” are different from each other as well.

Summary: I need to be able to mark overlapping of events taking into consideration 3 or more variables (“company” & “short description” and “created on”. I am attaching my current Knime Example.knwf (45.4 KB) where I am able to mark overlapping using Lag Column node, but so far I can only make it work based on just “created on” and not taking into consideration just the same company and the same short description. (Incorrect exercise marked in yellow)

One way I have it working is manually applying row filter nodes to filter out just one company and one short description at a time, but those companies and short description will increase over time, and then it doesn’t seem to be a practical way to automate this.

Thank you in advance for any guidance.

mlauber71 · February 25, 2021, 6:40am

I would have to have a closer look at your example but this could be a job for an SQL window function. You could explore this thread:

The KNIME implementation of H2 database does not support that yet (@Iris - window functions would really improved the local H2 implementation. So maybe there is a chance for an update).

ipazin · February 25, 2021, 1:16pm

Hello @Barajas,

isn’t this a job for Group Loop Start node?

Br,
Ivan

Barajas · February 26, 2021, 6:09pm

Hi @ipazin and thank you for your suggestion. I am looking into that node (I have never used it before) and I see that I can awesomely divide the information into sections for the company, which is the first part of my manual process, but not sure how to implement the second segregation (short description column), any specific ideas are truly appreciated, meanwhile I will continue my research on that node.

ipazin · February 26, 2021, 9:26pm

Hello @Barajas,

seems like same problem discussed here?

Br,
Ivan

ipazin · March 1, 2021, 10:32am

Duplicate