I am currently trying to automate some text stuff.
I wonder if there is a way to determine if the last character in a string column is just “one character”).
Let me explain:
- this word starts with a → label this as “yes” or 1 because the right most letter is just one character
- this word is wonderful → label as “no” or 0
- this word is truly fantastic → label as “no” or 0
- here is another word that just starts with a → label this as “yes” or 1
I thought of working with Regex but I am not 100% sure how to apply it.
Here’s my (very simple ) example workflow
THANK YOU in advance!
here is logic without regex that extracts last word:
substr( $sentence$, lastIndexOfChar( $sentence$, ' ') +1)
and here is regex that extracts last word (doesn’t match one word sentences so you are safe here):
regexReplace( $sentence$, ".*\\s(\\w+)$" , "$1")
- rows 2,3 and 4 from your example contain space after last word which I guess shouldn’t be there so I deleted it
- to get length simply wrap length() function around it
- to have one node solution for 1/0 or yes/no mark you can either use Column Expressions together with if() function (see here) or even better Rule Engine node and MATCHES operator (which is regular expression based). For latter this is regex:
$sentence$ MATCHES ".*\s.$" => 1
- if your final goal is to delete (filter) such rows avoid adding mark and use Rule-based Row Filter which has same MATCHES operator you can use with above mentioned regex
- if you can have one letter sentences above regex won’t match it so you’ll need to add another rule for it to work (
$sentence$ MATCHES "." => 1)
Good luck automating “some text stuff”
Yes, indeed, I try to get rid of these rows, which bullet point #3 (rule based row filter) does perfectly.
Why does the regexReplace need double backslash for space
“\\s” looks like escaping the space to me ?
The “double backslash” will be needed because otherwise the single backslash in \s would be treated as an escape within the string literal itself (within java and many other languages) and will either cause problems in parsing or else result in only “s” instead of “\s” being passed to the regex parser.
Ultimately it is just the single backslash version of the literal that actually makes it to the regex parser itself, as the \ is first of all interpreted in the string literal as an escaped \ which means that this single backslash is what is then passed on (along with the “s” that follows it)
Some explanations (better than mine! ) can be found here…
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.