Performing the same columns removal on multiple nodes ?

Startide · April 5, 2024, 11:44am

Hi there,

Not sure if I phrased this correctly, but basically I am doing a simple workflow with filters on data. I need to output those different nodes in different excel sheets.

But now I need to remove some columns from the output (GDPR), I know I can filter columns, but I don’t want to do it on every single of my output nodes. Is there a way to link them up to some sort of component of some kind and then still output them separetly ?

See illustration : everything needs to drop the same columns before being sent to excel.

Thanks for the assistance !

ArjenEX · April 5, 2024, 12:17pm

Hi @Startide

That sounds like a job for the Table Validator (Reference) node to me.

In port 0, enter your raw data. In port 1, add the desired output table definition. Here, I define that only “column1” is allowed to be passed through.

As such, “column2” is filtered out by the node.

This way you can ensure that every stream outputs the same table structure before it makes it to the Excel Writer.

Startide · April 5, 2024, 12:24pm

it does allow me to filter out columns, like columns filter does basically. But I can’t use that node with multiple input nodes.
I was trying to avoid changing 70 different flows having to add a node to each… and possibly having to change all 70 later on should the columns that need removing be altered.

ArjenEX · April 5, 2024, 12:38pm

Whichever method you implement, I think that’s unavoidable.

If there is, this would require the same amount of effort, right?

This is the main benefit of going the validator route, you only have to update it once.

Startide · April 5, 2024, 1:23pm

You do ? You imply that the validation node can carry the same rules across multiple nodes ?

I have a bunch of columns and the interface started lagging pretty badly adding all the ones I wanted… so I opted for the column filter node, since I gotta copy / paste it a bunch of time anyway.

But I am all for a better solution

Daniel_Weikert · April 5, 2024, 3:25pm

Do you have 70 different datasources and what kind of? You might be able to filter it out write the file into the same location and update all workflows by doing no changes
br

Startide · April 5, 2024, 3:58pm

I have one source file only, but I separate data per country, and also in different bunch of reports.

I could remove those columns earlier before I generate too many sub branches obviously, expect I perform tasks on those columns (before having to remove them). So I think I am kinda stuck.

takbb · April 5, 2024, 4:14pm

Hi @Startide

I’m not sure if this is a “better” solution… but here goes…

(Edit - i previously inadvertently referred to column renaming instead of filtering so I have corrected my earlier wording. This applies equally to filtering and renaming of columns)

So as you have said the Column Filter node cannot work on more than one table, and Table Validator Reference needs to be placed for every node being filtered, even though if it is using a reference table, at least it doesn’t have to be configured each time, but I take your point that you do have to drop in a Table Validator (Reference) node for each filter that is required.

So what other options are there?
Well, there is a node that can rename the columns on multiple tables at once as well as removing columns: the Table Manipulator node. The downside is that it only has one output port, and what it does is concatenate every table.

You can however tell it to include a table sequence number in the row ID, but you still face the issue of how then to split out the data into the original outputs. You could manipulate the new Row IDs in a new column

and then perform a Group Loop, and place your Excel Writer inside the loop, but that is likely to incur a performance penalty and may not be desirable. Unfortunately there isn’t a node that can easily split out your data into the original (arbitrary number of) tables.

But… I do have a component which can assist here. It can deal with 10 outputs, but for more outputs you can chain them…it isn’t pretty, and for 70 different outputs this means you need 7 of them chained together.

It would look like this - albeit I didn’t use 70 inputs!! (This has to be the most “fun looking” workflow I’ve ever written )

You can find the origin of the Regex Multi Row Splitter here

I have uploaded the pictured workflow up onto the hub.

I’m not sure if I would recommend this as a solution or not (maybe it would depend on the size of the data as it could end up being a huge concatenation), but it was an interesting experiment!

izaychik63 · April 5, 2024, 5:02pm

I would also try to place

before Excel Writer. If you place filter just on one input source and use Concatenate with “Use intersection of columns” setting, everything will be done automatically.

system · July 4, 2024, 5:03pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.