Regex to standardize LinkedIn url

Hi community!

I’m looking to standardize the linkedIn url’s from my CRM, I have multiple formats that need to be set like:
https://www.linkedin.com/company/justanexample/
or
https://www.linkedin.com/in/justanexample/
or
https://www.linkedin.com/pub/justanexample/

the different formats that I’m having are the following:

*http://www.linkedin.com/company/test
*linkedin.com/in/test
*https://linkedin.com/company/test
*https://www.linkedin.com/company/test
*https://linkedin/in/kladner
*https://www.ca.linkedin.com/in/test/
*https://ca.linkedin.com/in/test/
*ca.linkedin.com/in/kladner
*https://www.in.linkedin.com/in/test/

In the last 4 examples you can see that is the url is including a country code “ca” and “in”, these country codes need to be removed. when you try to search for a LinkedIn that has country code and www. it throws a 404 error.

Another very important thing is that after the linkedin.com/ we can have the value “pub”, “in” or “company” this value should remain, otherwise we will have a 404 error when looking for the url

As always thanks for your valuable help!

Hello,

I would say the simplest way to solve this is to search for “linkedin” and then capture that word and everything after it:

Screen Shot 2022-08-23 at 1.40.06 PM

I don’t think you need the http nor the www to get a valid hit (but it depends on what tools/browser/etc.), so a simple String Manipulation using join on “https://www.” and the Split Value 1 column is sufficient.

Hope that helps.

6 Likes

Thanks Victor, this has been very helpful!
:smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.