I want to check if in a column of strings I have one of these categories: only numbers, a mix of numbers and characters, or only characters

I want to check if in a column of strings I have one of these categories: only numbers, a mix of numbers and characters, or only characters. Could anyone guide me on how to accomplish this?

1 Like

Hi @B074534, can you confirm… when you say “characters” do you mean letters? or non-numerics? , or other symbols? I’m guessing you mean letters…
BR

1 Like

Hi @B074534,

I’d use the String Manipulation node with the “regexMatcher” function to determine whether the string is only numbers, only non-numbers (or the appropriate regex for the set of characters you are interested in, see @takbb’s question), and then use the Rule engine node to determine the mixed cases. Like so:

The workflow:

Hope that helps :slight_smile:
Best, Lukas

6 Likes

Hi @takbb,
Yes, I meant letters instead of characters, but now that you mentioned symbols I think there are some parenthesis and slashes in some of the strings, so I will need a solution that is able to cope with both scenarios.

Hi @B074534 , thanks for clarifying. Hopefully the workflow demo provided by @LukasS, can be adapted to your needs. A series of String Manipulation nodes using RegexMatcher can be given appropriate patterns to return the info.

As per @LukasS 's examples, regex classes could (for example) be:

"[0-9]*" - returns true if string contains only numerics
"[^0-9]*" - returns true if string contains only non-numerics

Additionally
"[A-Z]*" - returns true if string contains only Capital letters
"[a-z]*" - returns true if string contains only lower case letters
"[A-Za-z]*" - returns true if string contains only letters

"[A-z]*" - alternative to "[A-Za-z]*" , I think! Seems to work, anyway :thinking:

"[0-9A-Za-z]*" - returns true if string contains alphanumerics

"[0-9A-Za-z()\\\\ /-{}\\[\\]]*" - returns true if string contains alphanumerics plus a few other characters
( ) \ / - { } [ ] and space. image

Note the use of \\ to “escape” various characters that are otherwise interpreted by regex. For every single \ that you want to use in the regex, you need to enter it twice as it is interpreted first off by the client parser ( which strips out one \ and then passes it on to regex )… This means that for every \\ that appears in the regex pattern, a single \ will be passed to regex itself.

In other words, the pattern
[0-9A-Za-z()\\\\ /-{}\\[\\]]*

will be passed to regex as
[0-9A-Za-z()\\ /-{}\[\]]*

You might find some “special” characters appear not to need escaping, such as the { } in my example, and there may be some slightly trial and error, or you might find it easier to escape all regex special characters. That’ll possibly work too.

I hope that helps rather than confuses! :wink:

3 Likes

Hi both,

Thanks for your help I am new to REGEX and for some reason the workflow that @LukasS shared didn’t work to classify the “only not numbers” category with the command:
"[^0-9]*" - returns true if string contains only non-numerics
which I replaced with the:
"[A-Za-z]*" - returns true if string contains only letters
from @takbb post, to create an “Only letters” category which in theory should do the same and this did the trick!
Thanks again for your prompt help.
Highly appreciated.

2 Likes

Here is the amended workflow that I used in case anyone find it useful in the future, I added an extra node at the beginning to remove any spaces (either at the start, middle or the end of the string) and also a filter node at the end to keep only one column called “columntype”.

columntype_classification_with_Regex.knwf (14.5 KB)

2 Likes

Hi @B074534 , You are welcome, and I see what you are saying. The pattern [^0-9]* returns True if the entire string is ONLY non-numerics but will return False if there is at least one digit in it, so yes “only not numbers” is potentially a misnomer.

I can’t think of a regex pattern that will itself return True if there is at least one non-digit (even if there are also actually digits). There are many on this forum better than me with regex too, though so if there is a way they will hopefully provide an example! Until then, there is a way to do this with String Manipulation, by negating the return from [0-9]* as follows:
regexMatcher($string-column$,"[0-9]*").equals("True")?"False":"True"

This uses the “java ternary operator” also known as a shorthand if statement and basically says 'if the return from the regexMatcher function is “True”, then return “False”, otherwise return “True” ’

So it will return “False” if the string is entirely digits, and “True” otherwise.

You could obviously apply that same mechanism to any of the regex strings to reverse the True/False return.

1 Like

After some more thinking and tinkering, the following regex should return True for anything that contains at least one non-numeric:

regexMatcher($string-column$, ".*[^0-9]+.*")

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.