Data Cleansing Loop

Happy Monday Everyone,

I have been trying to figure something out without much luck. I’ve seen this and a couple of others about this concept but none of them quite get at what I need to do.

I essentially want to build a data cleansing node that will let me process many columns of data at one with something like the strip() function. I often work with dirty string data and don’t want to do each one at a time. I want to build a loop to iterate through each column of data. The problem is, when I follow some of the examples, the variable (CurrentColumnName) is just that. Only the column name.

How can I get it to make the column a variable and then apply the function to the entire column? Here is a screen show of what I’m thinking. Variables and Loops are still a bit funky to me…

Hi there,

In Column Expressions node one option for accessing column is following syntax:
column("column_name")

so you should use this syntax when flow variable value is column name:
strip(column(variable("currentColumnName")))

Why two loops?

Happy Monday to you as well :smiley:
Ivan

3 Likes

@ipazin,

Hey Ivan,

I fixed the workflow and I am getting new columns for each iteration. So my column count goes from 17 to 89. Is there a way to keep my Column Count without it creating new columns for each iteration?
image

Hi,

Yes there is :slight_smile:
The columns you are not cleaning remove from your table prior to Column List Loop Start and append after the Loop End node.

Also there is option Replace Column in Column Expressions node which you can use. Considering the node is in a loop you need a flow variable with value true to control this option.

Br,
Ivan

1 Like

Alternatively, you can always strip off the column header inside the loop and get a fixed column name this way (“Column 0” by default). The header can be injected afterwards. See 15370.knwf (31.5 KB)

Also note that a simple String Manipulation node works faster than Column Expressions, so it is good to use the former, unless you can not avoid it and need some advanced logic. For example, here is the processing time from the workflow above:

Cheers,
Misha

2 Likes

Everyone, thank you for the feedback!

@lisovyi, thank you! your example was great!

This is what I ended up using that worked for me

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.