Consecutive rows of a minimum size

Greetings all, help me with this please. I need to keep only the rows where Column 2 says “marked” at least 4 times in a row. Each group needs to be processed separately, so rows from different groups are not counted together.

Column 1 Column 2 Column 3
Group 0 marked 1.0
Group 1 marked 2.0
Group 1 marked 3.0
Group 1 non-marked 4.0
Group 1 marked 5.0
Group 1 non-marked 6.0
Group 1 marked 7.0
Group 1 non-marked 8.0
Group 1 non-marked 9.0
Group 2 non-marked 10.0
Group 2 marked 11.0
Group 2 non-marked 12.0
Group 2 marked 13.0
Group 2 non-marked 14.0
Group 2 non-marked 15.0
Group 2 non-marked 16.0
Group 2 marked 17.0
Group 2 non-marked 18.0
Group 2 marked 19.0
Group 2 non-marked 20.0
Group 2 marked 21.0
Group 2 marked 22.0
Group 2 marked 23.0
Group 2 marked 24.0
Group 2 non-marked 25.0

I am expecting my output to be like this:

Column 1 Column 3
Group 2 21.0
Group 2 22.0
Group 2 23.0
Group 2 24.0

I am hoping to get a solution that does not involve looping, but I welcome all attempts regardless.

Thank you :ok_hand:

Hi @badger101

Here is a possible solution, … but with a loop (sorry :slight_smile: ).
consecutive_rows_of_minimum_size.knwf (104.3 KB)

gr. Hans

2 Likes

Thanks for attempting @HansS , at the moment your workflow treats everything as one group.

I could get away with that issue by appending a nested loop,

I’ll wait at least 24 hours to see if anyone can find me a non-looping alternative.

Thanks though!

1 Like

Hi @badger101

Here is a possible approach with only 1 loop :slight_smile: .
consecutive_rows_of_minimum_size_no_loop.knwf (128.2 KB)

Instead of a Window Loop it uses the Lag Column node to create some dummy variables to help identifying a sequence of 4 rows “marked”. The relevant rows (id-s) are stored in a list. After unlisting the id-s, the Reference Row Filter fiilters the rows with a sequence of 4x “marked”.

gr. Hans

2 Likes

It is fairly straightforward to implement in SQL with no loops required ( e.g. using SQLite node + in memory connection). This will involve using the lead function for all 3 columns, then checking if the values have 4 consecutive marks + if they belong to the same group. Then you can return the index of the next 4 (or more) rows meeting those conditions. Happy to send that over later tomorrow when I have my laptop with me :slight_smile: if you want a truly no code solution with no loops - it may work with the lag nodes but all 3 columns would need to be lagged/ led and cross checked.

1 Like

Thanks @HansS I have tested it, and it works.

Thanks @Add94 , I won’t be able to verify a SQL-based solution, but feel free to share one if you have it available later, might be useful for those who understand SQL.

1 Like

@badger101 - No worries! I managed to implement the SQL-based solution using KNIME-native nodes too. Please see the workflow containing both :slight_smile: (SQL-based is a lot cleaner in my view!)

Consecutive rows of a minimum size.knwf (142.6 KB)

2 Likes

Thanks @Add94 , I have subjected your no-SQL route to the same tests as I did to @HansS solution (which passed it). Although yours worked for the dataset provided, it currently does not work when

  • a group contains two subgroups of consecutive rows that fit the rule. For example, if Group X has this sequence: marked-marked-marked-marked-nonmarked-nonmarked-marked-marked-marked-marked, then the workflow should be able to produce two results for that group.
  • the dataset has more than one group that contains consecutive rows that fit the criteria. For example, if you alter the dataset provided such that the first 4 rows of Group 1 to be marked-marked-marked-marked and leave the rest of the dataset intact, then the workflow should produce two results, one for Group 1 and another for Group 2.

I really appreciate the effort though @Add94 – thanks so much!

I have spent some time yesterday to work on my approach on this which passes the tests, just like @HansS solution passed them; the only difference is that I have managed to avoid loopings altogether.

If anyone reading this from the future & is looking for alternatives, here it is :arrow_heading_down:

And here’s the workflow :arrow_heading_down:
Selecting consecutive row of certain size without loop by badger101.knwf (93.7 KB)

Any improvement suggestion is always welcomed! :smiling_face:

2 Likes

Ah thanks a lot for letting me know! I suspected there may be additional cavaets / special considerations that otherwise would be hard to infer based on the sample data alone! Great to know you finally found the “ultimate” solution after iterating over the community provided workflows!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.