Hi, I’m using the XPath to extract some links from XML source code that I’ve scraped from web pages. Here’s the syntax that I’m using to search for all of the links:
<a target="_blank" rel=“nofollow” href=“https://www.example.com/”>example.com
<a title=“example.com” href=“https://www.example.com/” target="_blank">
<a href=“https://www.example.com/” target=_blank>
could you share an example link of one of these pages?
Here are some example pages:
Thanks for these – unfortunately I can only access the second URL here in Europe, the others are blocked.
PS: Will respond to your emails later
All of those pages link to
areavibes.com. If I could pull out all of those links on those pages, that would be ideal.
Now I see. These are placed in the image caption and only show when paging through the images carousel?
This will get complicated at the end. You could either do this with Selenium and cycle through the image sequence and after each click extract the links.
Or alternatively just try to extract links additionally with a regex – XPath will not help here, as the links are not placed within an
<a> tag so you’d need to process the plain HTML text. As a starting point, the Regex Extractor node has a regex template for extracting URLs which might work here.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.