Row filter (Labs) for columns present in a collection

Hi everyone,

I’m trying to find a way to filter out rows, if any of a given selection of columns contains a specific value.
The Row Filter (Labs) node is very close, but i don’t see a way to programatically choose the columns to filter by. I thought about working with column expression, though I’m not sure whether i can drop rows that way and how to work with variable column names, if i were to use a loop as an expression. Alternatively I might loop over the rows and use a math expression, but again i fail to envision how to realize this for a collection of column names.

I’m certain this is a common problem and i just haven’t found the right node combination yet.

Hi @Ellison the Rule Based Row Filter is probably your best bet for this type of use case.

Edit: I’ve just re-read what you said… I may have misunderstood what you are asking. Do you mean that you actually have a dynamically created “collection” of column names and you want to somehow include them all in the filter, e.g in a String List ?

1 Like

Yes, the collection flow variable has the columns to be checked against fixed values.

Edit: I guess i could loop over the collection and use the rule based row filter, but that doesn’t seem appealing for performance.

I don’t have an obvious simple solution using the standard nodes, but I do have a component that can assist…

Here is a possible way forward

The idea I had was to use Create Collection Column to build a column containing the values from your specified list of columns.

It then looks to see if any of a list of values (contained in a comma-delimited string) is present in the collection.

The main component that does that part of the work is this one:

3 Likes

Not sure if i fully understood your approach but this is my interpretation of it with a python node:

import knime.scripting.io as knio
import pandas as pd

if len(knio.flow_variables["required_columns"]) != 0:

    df = knio.input_tables[0].to_pandas()

    df["keep"] = df[knio.flow_variables["required_columns"]].isin([-9999999]).any(axis=1)
    df["keep"] = ~df["keep"]

    knio.output_tables[0] = knio.Table.from_pandas(df[df["keep"]])

1 Like

That’s pretty much it, I think. Maybe you should wrap it up as a “python row filter” :wink:

I’m giving it a try. What is the easiest way to allow input of a list? String configuration + splitting?