How to split Domain and subdomains?

Dears,

I need to split the subdomain and it’s domains, How to do it with KNIME?
for example :

image

domain.com -> domain.com
sub.domain.com -> domain.com
www.domain.com -> domain.com
www.sub.sub.domain.com -> domain.com
domain.co.uk -> domain.co.uk
sub2.domain.co.uk -> domain.co.uk
www.domain.co.uk -> domain.co.uk
www.sub.sub.domain.co.uk -> domain.co.uk
domain1.photography -> domain.photography
www.domain22.photography -> domain.photography
www.sub12f.domain.photography -> domain.photography

Hi @natanzi,

On the face of it, I was thinking a relatively straightforward regex split would do the job, but then of course it needed to know something about domain names, and specifically top level domain names, as there are country-to-country variations such as

something.fr
something.uk
something.co.uk

where you’d want the “something” to be included in the domain name.
I did find a forum post from 2015 but I didn’t make that work.

So I went in search of a list to provide the “knowledge” of top level and country-specific top level domains.

I settled on the “public suffix list

From this, we can extract a variation of domain suffixes and use that to derive the other parts of the address you require. (I think).

Anyway, the attached isn’t pretty, and is a little convoluted but it may help with the inspiration for you or others to shrink it down, or find better alternatives.

image

I hope that helps! :slight_smile:
Split domain names.knwf (35.8 KB)

7 Likes

Nice on @takbb , and nice find about the Public Suffix List, which is the key ingredient.

3 Likes

Dear @bruno29a @takbb Much appreciated…

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.