Question on String Manipulation - Pulling only text that matches

#1

Greetings,

I have a dataset that contains addresses with building numbers, and unit numbers at the end of an address. I am attempting to use String Manipulation with Regex but it’s not bringing any of the text over that matches the regex, it’s bringing over the entire row that matches.

Here is my example:

(0000 is a random number)

Data Exercept:
0000 BUENUE A UNIT: 208
0000 SOUTHGATE BLVD BLDG: 1 UNIT: 107
0000 MARVEY LN UNIT: 222
0000 W ## 1/2 ST BLDG: 1- MANSARD
0000 GOOSEVELT AVE UNIT: REAR
0000 BULL CREEK BLVD Bunit 150

KNIME I’m using String Manipulation with the Expression:
regexReplace($FOLDERNAME$,"(.*(?:ST|AVE|DR|DRIVE|BLVD|BOULEVARD|RD|CIRCLE|AVENUE|COSTA|LANE|ROAD|LN|SPEEDWAY|STREET)\b)","")

It brings over:
Data Exercept:
0000 BUENUE A UNIT: 208
0000 SOUTHGATE BLVD BLDG: 1 UNIT: 107
0000 MARVEY LN UNIT: 222
0000 W 00 1/2 ST BLDG: 1- MANSARD
0000 GOOSEVELT AVE UNIT: REAR
0000 BULL CREEK BLVD Bunit 150

But, I want it to bring over:
0000 BUENUE A
0000 SOUTHGATE BLVD
0000 MARVEY LN
0000 W ## 1/2 ST
0000 GOOSEVELT AVE
0000 BULL CREEK BLVD

Am I using the correct Node? Any help would be appreciated. Thank you.

0 Likes

#2

Dear @jhandatx,

My brain hurts trying to think of a regular expression that does what you want, so I leave that to you. But I suggest using the String Replacer node to extract the part of the string that you want:
image

This yields “0000 BUENUE A” for the first line of your data (but obviously does not work for the rest).

“$1” in the Replacement text box represents everything by the part of the regular expression in the Pattern box that is between the first pair of parentheses. “$2” would correspond to a second pair of parentheses, etc.

Good luck,
Aswin

0 Likes

#3

Greetings Aswin,

Thank you, here was what I did to solve my challenge:

image

Then do this for all of the following references that I shared in the original Regex with multiples of string replacers.

Hopefully, this helps others :smile:

Signed,
John

2 Likes

#4

Hi @jhandatx,

You can use a single String Manipulation node with this expression:

regexReplace($FOLDERNAME$, "(?i)(.*?)\\s(bldg|bunit|unit).*", "$1")

:blush:

2 Likes