New node for multi column String Replacer?

richards99 · July 12, 2018, 2:25pm

Hi,
After the much needed Multicolumn Maths node, I would greatly appreciate a Multicolumn String Replacer node so that I don’t have string many String Replacer noses together or set up a loop.

It was also asked for back in 2011 with other users commenting on wanting this feature.

Thanks,
Simon.

moritz.heine · July 13, 2018, 6:48am

Hi,

instead of multiple String Manipulation Nodes you could also use the new Column Expressions Node that provides all functions of the String Manipulation Node.

Edit: Sorry I misread the post. The thing you could do is to write your expression and copy it for all columns, which is somehow tedious but maybe a bit better than using several String Manipulation nodes.
What you can do to make it easier is to use your functions of a variable (say ‘col’) and in the first row assing ‘col’ the specified column. For example:

col = column(0)
capitalize(col)

Then you would have to change only the first line for each column, which pays of if you have a complex function reusing the same column over and over again.

Cheers,
Moritz

lorenz · July 13, 2018, 2:21pm

Hi,
For whole-table text operations, you can use R or Python snippets.
Nevertheless, a Multi Column String Replacer would be better.

Best,
Lorenz

justin.m · April 9, 2019, 9:03am

You may have solved this already, but this sounds like a job for Column List Loop Start and its sidekick Loop End (Column Append). You could run the loop over the columns you want to check and use the currentColumnName flow variable to tell the string replacer which column to modify.

richards99 · April 10, 2019, 5:52pm

I’ve been doing this for years already, but it’s a rather messy affair when you don’t want to do it on every column.

ipazin · April 11, 2019, 6:34am

Hi there!

To reduce the mess you can split your data on columns you want to loop over and the ones you do not want to loop over. After looping join back

Br,
Ivan

serendipitytech · July 7, 2019, 3:30pm

So, I’ve been struggling a bit with this myself.

I have several thousand fixed width text files that I’ve been trying to combine. I have them all combined in knime, and I want to add them to a MySQL database. The trouble I run into is that the files retain the extra spaces at the end of each field and it creates lines that are too long to import as varchar. I could try other text field options, but I’d rather remove the extra space here.

So, I setup a loop with Column List Loop Start, then String Manipulation with stripEnd($${ScurrentColumnName}$$), and ended with the Loop End (Column Append) node.

The results are not stripping any white space, it takes a long time to run through the multi million row table, so testing has been slow. I’m looking at setting up a subset of data to test with, but thought I’d check here to see what I might be missing in the process.

A single string manipulation node on a single column gets me the results expected, but the loop doesn’t seem to actually do the work.

serendipitytech · July 7, 2019, 6:27pm

Sooo, I was able to adapt the workflow I found here:
https://hub.knime.com/knime/space/Examples/06_Control_Structures/04_Loops/03_Looping_over_all_columns_and_manipulation_of_each

However, I’m wondering if this is really expected behavior? So it appears that the String Manipulation node, when inside a loop cannot use the column name variable for the “replace column” option, so in this case, I have to rename the column in the loop so that the string manipulation node is tied to the specifically named column?

I did a test with one of the files and it seems to be working, so I’m running it against the mass of files with fingers crossed. See you in several hours

lisovyi · July 9, 2019, 8:18am

Hi @serendipitytech,

you are right, one needs to rename the column in some way inside the loop iteration, if one uses String Manipulation. This can be done in the way as illustrated in the example that you shared. Or by stripping of the header and then adding it back.
Alternatively, you can use Column Expressions. Then you do not need to rename the column in the loop as you can access the column by index as suggested by Marten above.
Note, however, that Column Expressions are very slow, so I would advice to stick to the remaning scenario, unless you need advanced logic in string manipulations. As an example of performance comparison, here is a basic benchmark (the values in the table are execution time in ms):
2019-07-09_10h08_59
The corresponding workflow is available here:

Cheers,
Misha

ikesh · August 2, 2019, 7:06am

Hey. How can I loop over all columns and replacing all asterisk with null or empty values. Thanks in advance.

ipazin · August 2, 2019, 10:08am

Hi @ikesh,

You can modify some of above’s examples. Use Column List Loop Start and Column Expressions or String Manipulation node.

Is only asterisk in cell and you want missing value instead of it or asterisk is only one character within cell you would like to replace?

Br,
Ivan

Aswin · August 2, 2019, 10:46am

A loopfree alternative is to Unpivot the table so that all strings are in a single column, perform the operation of that and then Pivot again. (yes I love the Unpivot/Pivot nodes )

ipazin · September 7, 2020, 11:05am

Not anymore. See here: Column Expression Speedup with 4.2.1