Multiple Column Reference Row Filter

Hi all,

In my opinion reference row filter with multiple column selection would be useful.

What do you think?

Roland

1 Like

I’ve never used the Reference Row Filter, but it seems logical to have that feature.

A sequence of several Reference Row Filters provide the same functionality, so is it worth the effort? According to NodePit, 5% of the predecessors and 7% of the successors are RRFs, ranking 4th both thymes. Seems like a yes to me.

Things to consider:

  • Reference columns will have different length (row number), so we need a dynamic number of inputs or, prettier, ignore missing values. But then we can’t use missing values as reference value …
  • Bonus: Check reference value independently or row-wise.

Reference Row Filter — NodePit

1 Like

I use successive Reference Row Filters sometimes. The annoying thing with this construction is that every successive Reference Row Filter needs to iterate over the table again, which is time consuming. It also makes the workflow look messy. Sometimes I use a Joiner to get a multi-RRF effect without chaining. Or, I make a concatenated string out of the columns that I want to match. None of these alternatives are ideal.

@Thyme what you have in mind is, I think, to use multiple tables as a reference, while I was thinking of multiple columns in a single reference table, and I have the feeling that that is also what @rolandnemeth meant…?

Best
Aswin

@Aswin I was actually thinking about both ways. I’ll elaborate:

For every condition, we need an input vector of values to include/exclude. In general, those vectors will not be of equal length. The easy way would be to use one extra input table for each condition, select a column, done. The WF won’t be tidy with 7 incoming data connections.

Alternatively, we could use one input table and then select different columns for each condition. But since the columns will generally not have the same number of elements, but the same row count, there will be lots of missing values. We can ignore those missing values, but what if we want to use a missing value as filter criterion? So we probably need to take care of those missing values via a Missing Value Node, with “Last Row” to work around this.

The “Bonus”:
Since we have multiple conditions, we have the possibility to link them in different ways. One way would be to filter the table like sequential Reference Row Filters would: For each condition, the cell value has to be found in the reference column. If all conditions are met, the row goes through.
A more stricter filtering would be this (for each row separately):

  • Value of Column1 from input table was found in row11 of the reference column.
    → value of other filter columns have to match the value of their respective reference column in row11.

A checkbox could switch between those behaviours.