Replace Letters from string based in a condition

Hi Team,

I am looking for a solution to clean an item in a database. I need to clean from a column all the values that have only letters, because in the same column there are valid values with alphanumeric and numeric characters. Here’s the current situation and expected one:

image

I was trying to fix it, using the “column expressions” node. I need to replace the values with only letters for a missing value.

Thanks a lot

Hi @odiaz2309

A way to achieve this is via the Column Expression:

if (regexMatcher(column("ID"),"\\b[A-Za-z]+\\b") == true ) {
    null 
   } else {
    column("ID")
   }

Which translates to: match any string that only contains the letters A-Z in both lower and upper case.

Hope this helps!

2 Likes

Thanks a lot Arjen,

Just two questions:

  • What does “\b+\b” mean in the function?
  • How could I introduce space characters in the regexMatcher function?
1 Like

Sure @odiaz2309

The \b’s are for boundaries. If you do not disclose those than the Regex will also start to match some of the other records which includes a letter but does not only consist of letters, like the CE1019010.

image

I would recommend using an online Regex engine to play around with the Regex and you can see the resulting behavior “live”, like I have done above.

One example:

To allow spaces, you can just add it to the code between the brackets. So [A-Za-z] will
become [A-Za-z ]

3 Likes

Hi @odiaz2309 ,

Since you want to do this in the database, is there any reason why you are doing this through Knime as opposed to directly in the database?

You would have to download/extract the db table to a knime table, apply the change, and then reload the knime table into your db.

What db system are you using?

I would go with rule engine’s “matches” and search for a digit. If one is included then return value else return nothing. And I agree with @bruno29a helpful advice.
br

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.