Extract URLs from a text and store it in a new column

Hello everyone,

Hope you are all well and safe. I want to extract whole URLs from a text column and store them in a new column. As you all know the URLs could come in different formats such as:

www.sample.com
http(s)://www.sample.com
sample.com 
www.sample.com/xyz/xyz.php

Is there any nodes to do this or if for instance, I use String Manipulation node what syntax I can use to extract the URLs from a text column?

Thank you in advance.

Regards,

Safa

Hi Safa,

you can try Palladian’s Regex Extractor which comes with a pre-defined URL rule. It will match most of your examples and can easily be fine-tuned to be more permissive/strict:

Sample workflow is on my NodePit Space:

HTH,
Philipp

7 Likes

Hello Philipp,

Thank you very much for the reply. This is exactly what I was looking. Cheers.

Regards,

Safa

2 Likes

Hi again Philipp,

I have another question related to URLs. I wonder to know if you have any clue, how can I extract the hyperlinks or hidden links which is not in the form of URLs in the same text file?
Is there any node for that?

Thanks
Safa

Hi Safa,

could you give an example how these look?

Thanks,
Philipp

Hi Philipp,

For example, “click here” or any other word which is not an actual link but it is a hyperlink and the link is implemented inside that word and if you click on that it redirected you to a specific web page.

Regards,
Safa

Hi Safa,

but what type of input content do you actually have? HTML or something else?

Best,
Philipp

Hi Philipp,

Sorry in advance if I didn’t get you 100% as I am very new to Knime platform. I have a sample table which you can find it in the attachment and if you hover on it you can see they redirect you to for example Google website. Now my question is this how can I import that table to the Knime platform so I could be able to extract that web-link without opening it? because when I import that table as an Excel spreadsheet the links won’t show up inside the Knime platform.

Kind Regards,

Safa

hyperlink.xlsx (9.1 KB)

Got it! Regarding XLSX files, I am not sure if/how to read links into KNIME. KNIME’s Excel Reader node only seems to get the text and not the link, and there’s obviously no way to alter this behavior.

Maybe going via Python script, etc. could work or someone else here has an idea!

–Philipp

Thank you very much @qqilihq. I go and check to see if python script could do this for me or not, but anyone here has any clue about this I will appreciate it if they come back to me with their comments.

Thanks

Safa

I would have used a macro in excel (visual basic language) to make another column with the hyperlink before importing the table in Knime.

Look for example the simple vb code here below.

Hope it helps and good luck

Ludovico

2 Likes

Hi Ludovico,

Thank you very much for your help. I think this is exactly what I am looking for. I give it a try and let you know if I got the result.

Regards,

Safa

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.