Hope you are all well and safe. I want to extract whole URLs from a text column and store them in a new column. As you all know the URLs could come in different formats such as:
you can try Palladian’s Regex Extractor which comes with a pre-defined URL rule. It will match most of your examples and can easily be fine-tuned to be more permissive/strict:
I have another question related to URLs. I wonder to know if you have any clue, how can I extract the hyperlinks or hidden links which is not in the form of URLs in the same text file?
Is there any node for that?
For example, “click here” or any other word which is not an actual link but it is a hyperlink and the link is implemented inside that word and if you click on that it redirected you to a specific web page.
Sorry in advance if I didn’t get you 100% as I am very new to Knime platform. I have a sample table which you can find it in the attachment and if you hover on it you can see they redirect you to for example Google website. Now my question is this how can I import that table to the Knime platform so I could be able to extract that web-link without opening it? because when I import that table as an Excel spreadsheet the links won’t show up inside the Knime platform.
Got it! Regarding XLSX files, I am not sure if/how to read links into KNIME. KNIME’s Excel Reader node only seems to get the text and not the link, and there’s obviously no way to alter this behavior.
Maybe going via Python script, etc. could work or someone else here has an idea!
Thank you very much @qqilihq. I go and check to see if python script could do this for me or not, but anyone here has any clue about this I will appreciate it if they come back to me with their comments.