I want to check if in a column of strings I have one of these categories: only numbers, a mix of numbers and characters, or only characters. Could anyone guide me on how to accomplish this?
Hi @B074534, can you confirm… when you say “characters” do you mean letters? or non-numerics? , or other symbols? I’m guessing you mean letters…
BR
Hi @B074534,
I’d use the String Manipulation node with the “regexMatcher” function to determine whether the string is only numbers, only non-numbers (or the appropriate regex for the set of characters you are interested in, see @takbb’s question), and then use the Rule engine node to determine the mixed cases. Like so:
The workflow:
Hope that helps
Best, Lukas
Hi @takbb,
Yes, I meant letters instead of characters, but now that you mentioned symbols I think there are some parenthesis and slashes in some of the strings, so I will need a solution that is able to cope with both scenarios.
Hi @B074534 , thanks for clarifying. Hopefully the workflow demo provided by @LukasS, can be adapted to your needs. A series of String Manipulation nodes using RegexMatcher can be given appropriate patterns to return the info.
As per @LukasS 's examples, regex classes could (for example) be:
"[0-9]*"
- returns true if string contains only numerics
"[^0-9]*"
- returns true if string contains only non-numerics
Additionally
"[A-Z]*"
- returns true if string contains only Capital letters
"[a-z]*"
- returns true if string contains only lower case letters
"[A-Za-z]*"
- returns true if string contains only letters
"[A-z]*"
- alternative to "[A-Za-z]*"
, I think! Seems to work, anyway
"[0-9A-Za-z]*"
- returns true if string contains alphanumerics
"[0-9A-Za-z()\\\\ /-{}\\[\\]]*"
- returns true if string contains alphanumerics plus a few other characters
( ) \ / - { } [ ] and space
.
Note the use of \\
to “escape” various characters that are otherwise interpreted by regex. For every single \ that you want to use in the regex, you need to enter it twice as it is interpreted first off by the client parser ( which strips out one \ and then passes it on to regex )… This means that for every \\
that appears in the regex pattern, a single \ will be passed to regex itself.
In other words, the pattern
[0-9A-Za-z()\\\\ /-{}\\[\\]]*
will be passed to regex as
[0-9A-Za-z()\\ /-{}\[\]]*
You might find some “special” characters appear not to need escaping, such as the { } in my example, and there may be some slightly trial and error, or you might find it easier to escape all regex special characters. That’ll possibly work too.
I hope that helps rather than confuses!
Hi both,
Thanks for your help I am new to REGEX and for some reason the workflow that @LukasS shared didn’t work to classify the “only not numbers” category with the command:
"[^0-9]*"
- returns true if string contains only non-numerics
which I replaced with the:
"[A-Za-z]*"
- returns true if string contains only letters
from @takbb post, to create an “Only letters” category which in theory should do the same and this did the trick!
Thanks again for your prompt help.
Highly appreciated.
Here is the amended workflow that I used in case anyone find it useful in the future, I added an extra node at the beginning to remove any spaces (either at the start, middle or the end of the string) and also a filter node at the end to keep only one column called “columntype”.
Hi @B074534 , You are welcome, and I see what you are saying. The pattern [^0-9]*
returns True if the entire string is ONLY non-numerics but will return False if there is at least one digit in it, so yes “only not numbers” is potentially a misnomer.
I can’t think of a regex pattern that will itself return True if there is at least one non-digit (even if there are also actually digits). There are many on this forum better than me with regex too, though so if there is a way they will hopefully provide an example! Until then, there is a way to do this with String Manipulation, by negating the return from [0-9]*
as follows:
regexMatcher($string-column$,"[0-9]*").equals("True")?"False":"True"
This uses the “java ternary operator” also known as a shorthand if statement and basically says 'if the return from the regexMatcher function is “True”, then return “False”, otherwise return “True” ’
So it will return “False” if the string is entirely digits, and “True” otherwise.
You could obviously apply that same mechanism to any of the regex strings to reverse the True/False return.
After some more thinking and tinkering, the following regex should return True for anything that contains at least one non-numeric:
regexMatcher($string-column$, ".*[^0-9]+.*")
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.