Extract string based on Regex


I have a column with strings like below:

I would like to extract the string between AN~ and _MK like below:

I tried using this regex: AN~(\w+)_
But when i ran it with the Regex Split node, it kept giving me this error:
WARN Regex Split 2:8 5218 input string(s) did not match the pattern or contained more groups than expected

I'm not sure what has gone wrong. Some help would be appreciated

Hi

I would use a String Mainpulation node with:
substr($column1$, indexOf($column1$,"AN~" ) + 3,indexOf($column1$,"_MK" ) - 3 )
gr. Hans


@HansS gave the solution. You could also do this by cell splitter nodes and wildcards, but @HansS gave the better solution!

Thanks HansS! It works!

Hi there

in your regex in Regex Split node you are missing to match part after the underscore. This one should work: AN~(\w+)_.*

Also you can use String Manipulation with following regex:
regexReplace($column1$,"AN~(\\w+)_.*" ,"$1" )


Highly recommend the regex extractor node that is part of the Palladian collection.

