How to use regex and add to a new column afterwards?

Hello, I’m new with this knime software.

I’ve been using Column Expressions to use the regexMatcher. It append to a new column however, it always return true/false. I want it to return to its own value.

regexMatcher(column(“Name”), “[A-Za-z0-9]+”)

When I use Rule-Based Row Filter, it does return the value but does not append to a new column instead it overwrites my original data.

$Name$ MATCHES “[A-Za-z0-9]+” => TRUE

Is there a node that combine different columns and I append it together without having them to overwrite each other?

Thanks.

Hi @HeroVax and welcome to the Knime Community.

The regexMatcher() function, as per its description, returns boolean, that is true or false, depending on if it’s able to match the expression.

In most nodes, you have the option of overwriting a column or to append a new column when doing manipulations - basically any node that generates something.

A Rule-Based Row Filter is not going to generate anything. It’s only going to Filter Rows (hence the name). It’s not meant to modify any values, so I’m not sure how it’s overwriting your original data. If anything, it will get a subset of your original data. It will filter the rows where the column Name matches the expression “[A-Za-z0-9]+”.

It would help if you explained what you are trying to do.

If you are trying to extract whatever matches the expression, then you can use the Regex Extractor node from Palladian:

Otherwise, you can use the regexReplace() function from the String Manipulation or Column Expression by replacing everything else that does not match with nothing (hence keeping only what you need).

1 Like

Hello, @bruno29a
This is what I’m trying to do:

  1. Only start with English alphabet or numerical numbers.

  2. Put your new data into new column. Do not overwrite the original data.

Meaning, if column(“Name”) is the original data. I want to add new modified column and rename it to “Name (2)”.

At the end, I can see two columns. Column “Name” and “Name (2)”.

Supposed that there are three strings:
ПÑ�ковÐ
Motorcycle12
Car

The output for “Name”:
ПÑ�ковÐ
Motorcycle12
Car

The output for “Name (2)”:
Motorcycle12
Car

Hi @HeroVax , ok then you can simply use a Rule Engine like this:

This is the expression:
$Name$ MATCHES "[A-Za-z0-9]+" => $Name$

Make sure you choose the option “Append Column”

Results:
image

1 Like

Absolutely Legend! Thank you so much.

Hi @HeroVax ,

BTW, I don’t think your expression is doing what you want (Only start with English alphabet or numerical numbers). Your expression will match any string that contains English alphabet or numerical numbers.

If you really want to match strings that only start with English alphabet or numerical numbers, most probably this will work better:
^[A-Za-z0-9].*

So, using this expression:
$Name$ MATCHES "^[A-Za-z0-9].*" => $Name$

I get this:
image

That matches the rule that you wrote.

4 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.