Feature suggestion: "grouped" option for the Row Filter node

Dear Knimers,

Suppose I have a table with 100 million rows. It contains a column “Coleoptera” (beetles). The first few rows have the string “Coccinellidae” (ladybirds) in this column… I don’t know exactly how many rows, but I know from the way that the table was constructed/sorted that this beetle family does not occur anywhere else in the table.

I can now use the Row Filter with the pattern “Coccinellidae” to filter out the ladybirds. Hmmmm… it works, but it is very slow! The Row Filter node goes through all 100 million rows, even though I know that all the ladybirds are in the beginning of the table.

To make this more efficient, I suggest a “Grouped” option for the Row Filter. When checked, this will stop the Row Filter when a non-matching row follows a matching row, saving Knime the time and computational effort of going through all the remaining rows.

Best
Aswin

4 Likes

Similar to the “Input is already sorted by group column(s) [execution fails if not correctly sorted]” checkbox in the Group Loop Start? I like it

2 Likes

@Aswin , this is a great idea. This would definitely be more efficient.

@Thyme , I actually never noticed this option in the Group Loop Start. Thanks for bringing this up!! And I agree, something like this would cover what @Aswin is suggesting.

Thanks both of you.