Row Filter with Mulitple Criteria Question

I would like to use multiple criteria (some with wildcards) to filter a large table (Table A) with multiple columns.  My current strategy is to use a Row Filter Node for each filter criteria.  With 10 filter criteria, this means 10 Row Filter Nodes.

 

I would prefer to store the filter criteria (column, filter value, etc.) in a file/database.  The filter criteria would then be applied to each row of Table A as it is processed.  I have tried the following implementation:

 

XLS Reader (contains filter criteria) -> TableRow to Variable Loop Start -> Row Filter (data input is from Table A, Variable Inport to define filter criteria) -> Loop End

 

Issues:

  1. Table A data is iterated over multiple times.  Is there way to process it one time only?
  2. The output concatenates all of the iterations.  For example, if the input data table has 20 rows and there are 10 filter criteria, I believe the output will have 10*20-(#rows filtered).  The desired output would be 20-(#rows filtered).

Any help/recommendations would be appreciated.  Thanks in advance.

 

Andrew

I think you may need the delegating loop start and end nodes. Such that the filtered output from the first loop is passed back to the start for the second loop.

may be complex though, as you will still need your table row to variable loop start node loop as well. 

 

An alternative, but inefficient way is to use the row filter node to pass out the filtered out rows only and then take your concatenated output and then do a reference row filter with the initial table.

Simon.

 

 

Hi Simon,

 

Thanks for the response.  My attempt to use the Delegating Loop Start Node was not successful.  Screen shots of the example workflows are shown in the attached figures:

knime_workflow1.png - The Delegating Loop Start Node is excluded from the workflow

knime_workflow2.png - Delegating Loop Start Node is included in the workflow, halting the workflow and creating the following console message:

 

WARN      WorkflowManager     Unable to merge flow object stacks: Conflicting FlowObjects: <Loop Context (Head 4:42, Tail unassigned)> - iteration 0 vs. <Loop Context (Head 4:45, Tail unassigned)> - iteration 0 (loops not properly nested?)

 

I searched the forum and I believe the problem is that the loops are not nested correctly.  I am not clear how to properly nest the loops to avoid this problem.  Any tips/suggestions would be appreciated.

 

Thanks,

 

Andrew

Hi Andrew,

 

I will attach you a workflow which filters a string multiple time based on the configuration table.

 

Hope this helps, Iris

Very nicely done Iris!

Simon.

Hi Iris and Simon,

 

The solution Iris created works superbly and was very easy to adapt to my workflow.  I really appreciate your rapid response to my inquiry. 

 

Thank you,

 

Andrew

Hello everyone,

I know this is an old thread, but I have a quite related question: How do I include only those nodes containing one of multiple criteria. So I'm interested in a similar solution to the one of Iris, but instead, of filtering out the rows containing "test" or "bla", I'd like to include those. For some reason I cannot make it work. I guess I need a break.

 

Any help is much appreciated.

 

Best regards

If you can find the rows to exclude, you can use that table in conjunction with the Reference Row Filter to exclude those rows.

Cheers, gabor

Hey aborg,

thanks for your quick response. The next challenge would then be to determine the rows where column x contains the values "a" or "b" or "c".

My inquiry just based upon the workflow Iris posted in this thread. It works fine for me too, but it filters out all rows, containing "a" or "b" or "c". What I'd like to have is the opposite: All rows containing either "a" or "b" or "c".

It must be pretty simple, but I couldnt make it work. Shame on me.

Best regards

Keeping the right that I misunderstood something... Would a workflow like the attached be suitable for you?

Cheers, gabor

Hey aborg,

 

now I understand. Yes that works fine, but it does not allow for the usage of wild cards, does it? I'd like to filter for " Cluster* " which should return the entire table, but it creates an empty table, since the asterisk is not interpreted as a wildcard.

 

Best regards

 

That is true. In that case you should create the input table (the one where I used Table Creator) with a workflow similar provided by Iris (using regular row filters). It might worth using a GroupBy node on it to keep the table small by removing the duplicates. That way you will have all possible matching values you want to (reference) filter for.

Hey aborg,

 

yes, that'll work hopefully. Thanks for your help.

 

Best

Thanks Iris!

I've been struggling to implement a recursive workflow with unnested (cross-nested?) loops using the Delegating Loop nodes, and your workflow was exactly what I needed.

A very clever solution!

(the other) Simon

I downloaded your multiple criteria workflow but I suspect that because it has a .knime ext rather than a .knwf that Knime won’t import it??? Is there a way to convert it so that I can see it?
Thank you,
Steve

Hi @Selster

the workflow can still be imported by using File -> Import Workflow. Than you can just select the zip file.

Having this said, I made the workflow 7 years ago and there are quite some new features in KNIME which will make this easier. So if you want to start a new thread and explain what you want to achieve, we are happy to make you a most recent example.

2 Likes