I need to remove columns based on certain criteria, for example if the max value of the column is less than of a certain threshold.
I’ve found ways to remove columns based on
- their name (Column Filter),
- their variance (Low Variance Filter)
however nothing more.
Can you please advice me on how to achieve this task ?
Looking forward your reply !
I just got a nasty workaround … concatenating the values from the various columns and then doing regexp on that … there must be a cleaner way…
Perhaps a neat node would be a kind of marriage of one of those and (for numeric columns) the math expression node. Actually perhaps this is kind of like row filter for columns…
This would allow one to right an expression such as max(col)<1 and then select exclude and the node would remove all numeric columns with a threshold less then 1.
Definitely useful. One could in theory also use that to automate simple univariate outlier removal methodologies.
Sounds like an ideal use case for the use of the
thankyou for your comment. Indeed, in the workflow which I’ve built I do use the transpose node and is a very useful one.
I wish you a nice week.