URL Domain Extractor Doesn't Work With New TLDs

first8rows

Greetings!

I was using the URL Domain Extractor Node, and I found that the first 8 rows did not give the correct results. Might be because these new TLDs were not updated for this node.

1 Like

Hi badger101,

I assume you mean the node from the Palladian plugin, right?

Thanks for the feedback, I’ll see how we can improve the logic. Afair the node has indeed a dictionary of known TLDs built in for proper subdomain handling which we’d need to update for new TLDs.

Just for clarification: For your specific case as highlighted above, you could simply strip away the http:// prefix, but I assume that was just for illustration purposes and your “real” data contains full URLs with paths and / or you want to strip the subdomains, right?

I’ll keep this thread updated once there’s an update.

–Philipp

1 Like

Hi @qqilihq yes that’s correct.

I discovered this accidentally when I was experimenting for something else. My dataset doesn’t have these new TLDs, but I’m sure if you can include those, it can be helpful for people working in crypto areas in the future as these are becoming more popular over time.

Anyways, the reason I stumbled across this was when I wanted to extract domains together with the protocol. For links that contain subdomains, if I were to tick the box for protocol but not subdomains , the output will not include the protocol. For example, https://abc.def.com will return def.com instead of https://def.com (I can work my way with this with some alteration in the workflow onwards though).

1 Like