Column Value Special Character validation

Dear Knimers,

Please help to find a node or workflow to identif these special characters in the column values

[^a-z0-9’()&#*./@+,!?~=:;"_%{}$\ -[]ëé¢á| -] [^a-z0-9&#./',()_:;%+~@"?= -]
[^a-z0-9./&()#‘’,:%;?~+=*! -] [^a-z0-9./‘,#().&@!+ -]
^[A-Za-z0-9._%-()’]+@[A-Za-z0-9.-]+[.][A-Za-z]+$

Thanks in advance!

Hi @Pragadeshgp

If you just want to test if the column values matches the Regex, you can use the RegexMatcher function. For example in a Column Expression node.

I took one Regex as example with some dummy date:

The output of this is either true or false

WF:
Column Value Special Character validation.knwf (12.2 KB)

Hopefully this provide some inspiration!

1 Like

Thank you @ArjenEX

if there is a special character and text mixed unable to get the correct value, please see the screenshot below in Row5

image

@ArjenEX I am using csv file as input

Hi @Pragadeshgp,

This was just for illustration purposes. As mentioned, I only used one of your Regex codes (#2). You would need to implement them all first. You can do that separately, loop them, nest them etc.

1 Like

yes @ArjenEX

I am trying to implement these valaidation code “[^a-z0-9’()&#*./@+,!?~=:`;”_%{}$\ -ëé¢á| -]"

used this expression:
regexMatcher(column(“AccountName”),“.[^a-z0-9’()&#/@+,!?~=:`;_%{}$\-ëé¢á|-].” )

Its not working, All the below account name is coming as True

LAFAYETTE GRDE @
22 MERRILL
3 - D’S PARTY STORE
7-ELEVEN #13434-
3 B PARTY STORE
7-ELEVEN #13440-C
RAND’S ON THE LAKE (UNIQUE FOOD MANAGEMENT INC.)
BLUSTAGE CLUB INC
SX AND PARK PARTY SHOP
SX LKES PARTY STORE
CVS #028
GARDEN CITY PARTY STORE

after quotes started " ." and ended "." based on a example but not sure of the meaning.

This one is working:

regexMatcher(column(“column1”), “.*[^a-z0-9’()&#/@+,!?~=:`;_%{}$\-ëé¢á|-].*”)

The KNIME forum probably converts your wildcard as the asterisk is also a text command for italic.

So just to be sure: the wildcard is .* instead of . It now checks for not allowed character in the entire sentence. That the lines you listed as example all come out as true is in fact right. They all contain spaces and your Regex does not include that as allowed character.

See example with additional lines that do not contain a space:

1 Like

Thank you @ArjenEX for your help!
if add a space in a Expression Editor then Row0 output should change to ‘false’ right but its not happening
regexMatcher(column(“column1”), “.[^a-z0-9’() &#/@+,!?~=:`;_%{}$-ëé¢á|-].”)

That is explained again by your Regex, you have to be very precise with it :wink: You are limiting your letters from a-z which is lowercase only. Your text is in uppercase, represented by A-Z. If you want to allow both, use A-Za-z

@ArjenEX

I am using below expression
image

Getting below results, yellow highlighted should be false correct? but it is showing as true
image

Sorry for keep coming back on this, I want to make sure this regexmatcher is clear to me

Thanks in advance

No, KNIME is correct. You’re checking a wildcard (so any set character for indefinitely length), then at least one letter, then again a wildcard allowing anything… So @ ABC @ meets that requirement.

Try working with a regex engine tester first before going to KNIME. That will help you get the right syntax and help you understand what it is doing.

@ArjenEX Thank You!

I will check on regex engine tester

Hi @Pragadeshgp , please use the preformatted text option when pasting your code. The forum will then display them as is. To achieve this, follow these steps:

  1. Write or paste your expressions
  2. Highlight them
  3. Press on this icon image at the top of the box where you are writing

That way we will see your expressions correctly.

To accomplish what you want to do to get the true and false, it’s probably easiest to use the Rule Engine node:

$column1$ MATCHES Your_regex_1 => true
$column1$ MATCHES Your_regex_2 => true
$column1$ MATCHES Your_regex_3 => true
TRUE => false

With the Rule Engine, you do not need to bother how to write if elseif else statements

EDIT: It will also help if you explain what you are trying to do. I mean, since it seems like your expressions are not doing what you expected them to do, we can’t rely on these given expressions to help you find the right expressions. You would need to explain what rule you want to apply.

EDIT 2:
If you want to check if a row has special characters, I’m assuming that means if it has anything other than the A-Z alphabet (upper case and lower case), and numbers from 0-9, correct?

If that is the case, here’s a trick: Just replace [a-zA-Z0-9] with “” and simply check if anything is left.

Here’s how it works:
image

Input:
image

Results:
image

Here’s the expression I used:


regexMatcher(regexMatcher(strip(regexReplace($column1$, "[a-zA-Z0-9]", "")), ""), "False")

Note: The double regexMatcher() is to simply invert True and False (just trying to do everything in the same node)

2 Likes

@bruno29a thank you a lot for the detailed explanation