Consecutive rows of a minimum size

badger101 · February 13, 2025, 8:30pm

Greetings all, help me with this please. I need to keep only the rows where Column 2 says “marked” at least 4 times in a row. Each group needs to be processed separately, so rows from different groups are not counted together.

Column 1	Column 2	Column 3
Group 0	marked	1.0
Group 1	marked	2.0
Group 1	marked	3.0
Group 1	non-marked	4.0
Group 1	marked	5.0
Group 1	non-marked	6.0
Group 1	marked	7.0
Group 1	non-marked	8.0
Group 1	non-marked	9.0
Group 2	non-marked	10.0
Group 2	marked	11.0
Group 2	non-marked	12.0
Group 2	marked	13.0
Group 2	non-marked	14.0
Group 2	non-marked	15.0
Group 2	non-marked	16.0
Group 2	marked	17.0
Group 2	non-marked	18.0
Group 2	marked	19.0
Group 2	non-marked	20.0
Group 2	marked	21.0
Group 2	marked	22.0
Group 2	marked	23.0
Group 2	marked	24.0
Group 2	non-marked	25.0

I am expecting my output to be like this:

Column 1	Column 3
Group 2	21.0
Group 2	22.0
Group 2	23.0
Group 2	24.0

I am hoping to get a solution that does not involve looping, but I welcome all attempts regardless.

Thank you

HansS · February 13, 2025, 9:23pm

Hi @badger101

Here is a possible solution, … but with a loop (sorry ).
consecutive_rows_of_minimum_size.knwf (104.3 KB)

gr. Hans

badger101 · February 13, 2025, 9:38pm

Thanks for attempting @HansS , at the moment your workflow treats everything as one group.

I could get away with that issue by appending a nested loop,

I’ll wait at least 24 hours to see if anyone can find me a non-looping alternative.

Thanks though!

HansS · February 14, 2025, 7:16am

Hi @badger101

Here is a possible approach with only 1 loop .
consecutive_rows_of_minimum_size_no_loop.knwf (128.2 KB)

Instead of a Window Loop it uses the Lag Column node to create some dummy variables to help identifying a sequence of 4 rows “marked”. The relevant rows (id-s) are stored in a list. After unlisting the id-s, the Reference Row Filter fiilters the rows with a sequence of 4x “marked”.

gr. Hans

Add94 · February 14, 2025, 3:29pm

It is fairly straightforward to implement in SQL with no loops required ( e.g. using SQLite node + in memory connection). This will involve using the lead function for all 3 columns, then checking if the values have 4 consecutive marks + if they belong to the same group. Then you can return the index of the next 4 (or more) rows meeting those conditions. Happy to send that over later tomorrow when I have my laptop with me if you want a truly no code solution with no loops - it may work with the lag nodes but all 3 columns would need to be lagged/ led and cross checked.

badger101 · February 14, 2025, 6:58pm

Thanks @HansS I have tested it, and it works.

Thanks @Add94 , I won’t be able to verify a SQL-based solution, but feel free to share one if you have it available later, might be useful for those who understand SQL.

Add94 · February 15, 2025, 4:37pm

@badger101 - No worries! I managed to implement the SQL-based solution using KNIME-native nodes too. Please see the workflow containing both (SQL-based is a lot cleaner in my view!)

Consecutive rows of a minimum size.knwf (142.6 KB)

badger101 · February 15, 2025, 5:23pm

Thanks @Add94 , I have subjected your no-SQL route to the same tests as I did to @HansS solution (which passed it). Although yours worked for the dataset provided, it currently does not work when

a group contains two subgroups of consecutive rows that fit the rule. For example, if Group X has this sequence: marked-marked-marked-marked-nonmarked-nonmarked-marked-marked-marked-marked, then the workflow should be able to produce two results for that group.
the dataset has more than one group that contains consecutive rows that fit the criteria. For example, if you alter the dataset provided such that the first 4 rows of Group 1 to be marked-marked-marked-marked and leave the rest of the dataset intact, then the workflow should produce two results, one for Group 1 and another for Group 2.

I really appreciate the effort though @Add94 – thanks so much!

I have spent some time yesterday to work on my approach on this which passes the tests, just like @HansS solution passed them; the only difference is that I have managed to avoid loopings altogether.

If anyone reading this from the future & is looking for alternatives, here it is

And here’s the workflow
Selecting consecutive row of certain size without loop by badger101.knwf (93.7 KB)

Any improvement suggestion is always welcomed!

Add94 · February 15, 2025, 5:28pm

Ah thanks a lot for letting me know! I suspected there may be additional cavaets / special considerations that otherwise would be hard to infer based on the sample data alone! Great to know you finally found the “ultimate” solution after iterating over the community provided workflows!

system · February 22, 2025, 5:28pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.