String manipulation: delete if last character is only one letter

kowisoft · April 13, 2021, 8:24am

Dear KNIMErs,

I am currently trying to automate some text stuff.

I wonder if there is a way to determine if the last character in a string column is just “one character”).

Let me explain:

this word starts with a → label this as “yes” or 1 because the right most letter is just one character
this word is wonderful → label as “no” or 0
this word is truly fantastic → label as “no” or 0
here is another word that just starts with a → label this as “yes” or 1

I thought of working with Regex but I am not 100% sure how to apply it.

Here’s my (very simple ) example workflow

THANK YOU in advance!

ipazin · April 13, 2021, 9:07am

Hello @kowisoft,

here is logic without regex that extracts last word:
substr( $sentence$, lastIndexOfChar( $sentence$, ' ') +1)

and here is regex that extracts last word (doesn’t match one word sentences so you are safe here):
regexReplace( $sentence$, ".*\\s(\\w+)$" , "$1")

Note:

rows 2,3 and 4 from your example contain space after last word which I guess shouldn’t be there so I deleted it
to get length simply wrap length() function around it
to have one node solution for 1/0 or yes/no mark you can either use Column Expressions together with if() function (see here) or even better Rule Engine node and MATCHES operator (which is regular expression based). For latter this is regex: $sentence$ MATCHES ".*\s.$" => 1
if your final goal is to delete (filter) such rows avoid adding mark and use Rule-based Row Filter which has same MATCHES operator you can use with above mentioned regex
if you can have one letter sentences above regex won’t match it so you’ll need to add another rule for it to work ($sentence$ MATCHES "." => 1)

Good luck automating “some text stuff”

Br,
Ivan

kowisoft · April 13, 2021, 9:18am

Fantastic @ipazin

Yes, indeed, I try to get rid of these rows, which bullet point #3 (rule based row filter) does perfectly.

Daniel_Weikert · April 13, 2021, 5:18pm

Great solution.
Why does the regexReplace need double backslash for space
“\\s” looks like escaping the space to me ?

takbb · April 13, 2021, 11:30pm

The “double backslash” will be needed because otherwise the single backslash in \s would be treated as an escape within the string literal itself (within java and many other languages) and will either cause problems in parsing or else result in only “s” instead of “\s” being passed to the regex parser.

Ultimately it is just the single backslash version of the literal that actually makes it to the regex parser itself, as the \ is first of all interpreted in the string literal as an escaped \ which means that this single backslash is what is then passed on (along with the “s” that follows it)

Some explanations (better than mine! ) can be found here…

system · April 20, 2021, 11:30pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.