Remove empty columns

Angus · May 20, 2014, 9:29pm

Hi,

I'm sure there must be a simple solution to this, but I can't think of it.

I have a file with multiple columns (>250), a lot of these columns have no data in them for any of the rows. Is there a workflow/node that will examine each column, inspect it for any data in any of the rows and discard/filter the column if all the rows are empty to leave only columns with data left.

Thanks,
Angus

richards99 · May 20, 2014, 10:04pm

Why not use the groupby node. In the node, don't group on anything, but in the options tab, aggregate on all columns using Sum. This now condenses everything to one row, showing if any data is present.

then use the transpose node, and use the row filter to only leave those rows with data present. So choose to exclude those with missing data.

then transpose back.

Finally you now have the column names which contain data, use this with a reference column filter node on the original table.

hope that helps,

simon.

Angus · May 24, 2014, 7:40am

Thanks Simon, that worked.

Angus

Macca · May 27, 2014, 9:18am

The XLS reader has the option Skip empty columns. Maybe that could help.

Stephen

Aaron_Hart · June 2, 2014, 4:55pm

There is also a Low Variance Filter node which can be used for this purpose as well.

richards99 · June 2, 2014, 7:30pm

Thanks Aaron, such an underused node on my part. I forget about this one.

simon.