[Bug] URL Domain Extractor does not recognize TLD and breaks for Upper Case

Hi,

while doing some research I happen to notice that the URL Domain Extract struggles to extract domains in the following cases:

  1. UPPER-CASE.COM
  2. domain.co.ke or domain.com.mm

PS: Domains consisting of IPs is kind of an edge case which I have not included but it’s apparently not working either.

Best
Mike

Hi Mike,

Thanks for bringing that to my attention. I’m aware of some other issues with the node already, which we’ll address together with this one in an upcoming update.

–Philipp

2 Likes

This is addressed in the Palladian Nodes v2.6 update. More details here.

Thanks Phillip,

and apologize for my delayed response. Family got sick and I had some urgent work items to address. Comparing the results they have significantly improved but there also some minor regressions present it seems.

Simple Test shows fixes

Minor Regressions

Minor regressions in this CSV (gzip in my Google Drive due to upload restrictions).

Best
Mike

Thanks for the feedback. I’ll have a look at the regressions after holiday.

In the meantime, get well :slight_smile:

Best,
Philipp

1 Like

Hi @mwiegand,

Hope you’ve recovered well!

Again thanks for the detailed list. I’m currently trying to figure out if there’s any issues to fix. Could you clarify about the following cases:

  1. There’s missing values in the last three columns e.g. for .um resp. HTTPS://SUB-DOMAIN.DOMAIN.UM. Does this mean they are not extracted correctly for you using the updated version?

  2. There’s rows which have missing values for the first three columns (e.g. .edu.al resp. https://domain.edu.al). Does this mean that it was not extracted properly before, but now works with 2.6?

  3. Generally, the first three columns in the CSV resemble results pre-2.6, and the last three the results of 2.6 – is this correct?

Many thanks!
Philipp

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.