Regex to standardize phone formats

Hello community!

I’m looking to standardize phone numbers from US, we are having the following formats:

8882739371
603-435-3438
(978) 567-3544
9733166016 :517
972.204.9626
9164423613 :4655
(909) 793-2853:4024

The idea is to leave it as (909) 793-2853:4024, after the “:” we have the extension, this is present in only some of the phone numbers, if there is no extension the phone number should be like: (909) 793-2853

If someone from the community can help with this would be awesome, I think the best approach could be using regex in string manipulation node.

Thanks!

Hello @cat_data,

and welcome to KNIME Community!

You only want to remove double colon along with numbers following it and leave everything else as is?

Try out this regular expression:
regexReplace( $yourColumnName$, "(.*) ?:.*" , "$1" )

Br,
Ivan

1 Like

Hello @ipazin!

Thanks for your answer, but actually what I’m looking for is to standardize the phone numbers in this way:
current format->Desired format

8882739371 → (888) 273-9371
603-435-3438 → (603) 435-3438
(978) 567-3544 → (978) 567-3544
9733166016 :517 → (973) 316-6016:517
972.204.9626 → (972) 204-9626
9164423613 :4655 → (916) 442-3613:4655
(909) 793-2853:4024 → (909) 793-2853:4024

If phone numbers without extension always have 10 digits, then that’s no problem.

  1. remove non-digits, except ":"
    regexReplace($phone numbers$, "[^0-9|:]", "")
  2. inject formatting characters
    regexReplace($phone numbers$, "^(.{3})(.{3})(.*)$" , "($1) $2-$3")
    this step could also be done with normal string concatenation of substrings as well

edit: Thanks to Gordon I realised that the 2nd replacement can be cut down a little
nesting is obviously also possible, mine looks like this then:
regexReplace(regexReplace($phone numbers$, "[^0-9|:]", ""), "^(.{3})(.{3})(.*)$" , "($1) $2-$3")
RegEx to standardise phone formats
RegEx to standardise phone formats.knwf (16.7 KB)

4 Likes

Hello @cat_data and welcome to the KNIME forum.

Try this ‘String Manipulation’ node:

join("("
	, regexReplace(
		regexReplace($text$, "[^0-9:]+", "")
		, "^(.{3}).*", "$1")
	,") "
	, regexReplace(
		regexReplace($text$, "[^0-9:]+", "")
		, "^.{3}(.{3}).*", "$1")
	,"-"
	, regexReplace(
		regexReplace($text$, "[^0-9:]+", "")
		, "^.{6}(.*)", "$1")
)

BR

2 Likes

@Thyme 's solution in a single node :+1:t3:

regexReplace(
	regexReplace($text$, "[^0-9:]+", "")
		, "^(.{3})(.{3})(.*)", "($1) $2-$3")
3 Likes

Thank you guys!
@Thyme @gonhaddock

3 Likes

@cat_data
Please, give this solution to @Thyme :tophat:

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.