Extract string based on Regex

Hi,

I have a column with strings like below:
AN~Brand1_MK~HK
AN~Brand2_MK~US

I would like to extract the string between AN~ and _MK like below:
Brand1
Brand2

I tried using this regex: AN~(\w+)_
But when i ran it with the Regex Split node, it kept giving me this error:
WARN Regex Split 2:8 5218 input string(s) did not match the pattern or contained more groups than expected

I’m not sure what has gone wrong. Some help would be appreciated :slight_smile:

Hi @sljw

I would use a String Mainpulation node with:
substr($column1$, indexOf($column1$,"AN~" ) + 3,indexOf($column1$,"_MK" ) - 3 )
gr. Hans

4 Likes

@HansS gave the solution. You could also do this by cell splitter nodes and wildcards, but @HansS gave the better solution!

1 Like

Thanks HansS! It works!

1 Like

Hi there @sljw,

in your regex in Regex Split node you are missing to match part after the underscore. This one should work: AN~(\w+)_.*

Also you can use String Manipulation with following regex:
regexReplace($column1$,"AN~(\\w+)_.*" ,"$1" )

Br,
Ivan

Highly recommend the regex extractor node that is part of the Palladian collection.

1 Like