RegEX Help, New User

UnknownValue · April 8, 2022, 11:04pm

Hi,

Can someone please help extract LWS1098,LWS1147,LWS1648 out of the text below?

TE.PH2022.Joanna.LWS1098.LWS1147.LWS1648.Campaign

UnknownValue · April 8, 2022, 11:37pm

I am trying to use String Replacer for this but I cannot get the coding right…

elsamuel · April 9, 2022, 4:41am

What have you tried?
And what exactly is the output that you’re aiming for?

Some variation on LWS[0-9]{4} should work.
Depending on the desired output, you could use the Regex Extractor node.

UnknownValue · April 9, 2022, 10:04am

My attempt was with variations of [a-zA-z]{4}
I am looking to remove all text but LWS numbers = LWS1234

duristef · April 9, 2022, 2:51pm

Hi @UnknownValue ,
if this is the result you want

and provided the substrings are separated by dots as in your example, you can use this code in a Column Expression node

array_in = split(column("column1"),'.')
array_out = []

for (i=0, max_i=arrayLength(array_in); i<max_i; i++) {
    el = strip(array_in[i])
    if (regexMatcher(el,"^LWS\\d+$")) {
        array_out.push(el)
    }
}
joinSep(",",array_out)

duristef · April 9, 2022, 3:24pm

if you use a Python node you can output both a comma separated string and a list

import re
output_table_1 = input_table_1.copy()
regex = re.compile('^LWS\d+$')
output_table_1["list_result"] = output_table_1["column1"].apply(lambda x: list(filter(regex.match ,x.split("."))))
output_table_1["str_result"] = output_table_1["column1"].apply(lambda x: ",".join(filter(regex.match ,x.split("."))))

a bit more readable:

import re
output_table_1 = input_table_1.copy()
regex = re.compile('^LWS\d+$')

def getSubstrings(x, r):
	return list(filter(r.match ,x.split(".")))

output_table_1["list_result"] = output_table_1["column1"].apply(lambda x: getSubstrings(x, regex))
output_table_1["str_result"] = output_table_1["column1"].apply(lambda x: ",".join(getSubstrings(x, regex)))

system · July 8, 2022, 3:24pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.