Removing columns based on some function

lucatoldo · August 28, 2009, 6:28pm

Dear KNIMErs,
I need to remove columns based on certain criteria, for example if the max value of the column is less than of a certain threshold.
I’ve found ways to remove columns based on

their name (Column Filter),
their variance (Low Variance Filter)
however nothing more.

Can you please advice me on how to achieve this task ?
Looking forward your reply !

lucatoldo · August 28, 2009, 7:19pm

I just got a nasty workaround … concatenating the values from the various columns and then doing regexp on that … there must be a cleaner way…

Jay · August 28, 2009, 9:07pm

Perhaps a neat node would be a kind of marriage of one of those and (for numeric columns) the math expression node. Actually perhaps this is kind of like row filter for columns…

This would allow one to right an expression such as max(col)<1 and then select exclude and the node would remove all numeric columns with a threshold less then 1.

Definitely useful. One could in theory also use that to automate simple univariate outlier removal methodologies.

aborg · August 28, 2009, 10:33pm

Sounds like an ideal use case for the use of the Transpose node.

lucatoldo · August 31, 2009, 7:54am

Dear Aborg,
thankyou for your comment. Indeed, in the workflow which I’ve built I do use the transpose node and is a very useful one.
I wish you a nice week.
luca