Problem using flow variables to specify included columns with a column filter

I have a table with a number of different columns and I would like to use a variable loop to choose a set of columns to filter from a dataset.

I passed a list of target columns to a flow variable. I then want to use the flow variables to set the included columns of the column filter node.

 

What I've tried so far:

The "included_names" option under flow variables only includes available spots (0...n) based on the number of already included columns.  This is solved by "seeding" some filtered columns.

Although I am able to set the included names with the flow variables the node output does not include the target columns.

Is there something I'm missing?

 

Hi,

Presumably you want to filter individual columns only? In this case configuring the column filter loop to wildcard/regex search and using the column name as filter statement should do the trick. For more  than one column at once better use a reference column filter.

Cheers
E

Make sure you have force inclusion selected in the node. It should work what you are trying to achieve.

Simon.

Simon,

That's interesting - I never quite knew what this was good for. At least it solves issues. ;-)

Cheers
E

An additional tip: put a column filter before it while you are configuring it and remove all columns that should be filtered out. After that you should change back to the original table. Once you do the looping this way it will not complain about containing columns in both include and exclude lists.

During configuration:

original sourc -> temporary column filter -> column filter configured by flow variables

After "column filter configured by flow variables" is configured, the result should look like this again:

original sourc ----------------------------> column filter configured by flow variables

Cheers, gabor

Force inclusion and exclusion are really powerful options which most people don't know what they are for.

besides the tip I mentioned, they also have a bigger purpose.

force inclusion means when you put columns into the include box, if you later rerun the workflow and there are new columns, they automatically go into the exclusion box.

force exclusion means that when you put columns into the exclude box, if you later rerun the workflow and there are new columns, they automatically go into the inclusion box.

simon.

Ah-hah! Many thanks indeed, good to know.

-E

That's the theory, anyway... More often than not Nodes are grabbing every new column, even if exclusion is forced. Haven't found the pattern yet, though. The only thing I do know is that Joiners are among the few that do it right.

My typical solution is to regex filter columns, but that only works in one direction, so I'm often left filtering with Splitters. Additional benefit of this solution: FlowVars and regex selection work great together. On the other hand, many Nodes that need a list of columns don't support selection by regex, which can be a pain in the butt. One particular offender is the Column List Loop Start, which can really be annoying. (Even more so as loops iterating over columns are so much faster than loops iterating over rows, for whatever reason, even including up to four Transposes to simulate row iteration.) Also, unadjustable and inconsistent automatic column naming gets in the way all the time. But then my Workflows are littered with cosmetic Nodes anyway, because that's just how Knime rolls.

Good news: "Column List Loop Start" is on the list to become RegEx-enabled, probably in a point release shortly after 2.11.0. They said so in a (more extensive) feature request post of mine.

Concur on the awkwardness of automatic naming, though - worthy of end-to-end redesign.

Cheers
E