Extract links on a Certain Web Page


I am totally new to KNIME, and trying to perform some web analytical tasks using it. I am trying to extract all the articles that appeared on Reddit regarding one specific topic, for example, Facebook.

Basically, what I tried to do is to extract all the links (ursl) on the page "https://www.reddit.com/r/facebook/",including the ones when you click next page until the end.  And then using the content extractor to extract all the content for each article. I have found an example workflow to work off with, but when I tried to execute the loop to fetch pages, it wasn't working properly. I am not really sure which part I should change based on my needs. I have attached the workflow I have been working on.

Any help would be highly appreciated! Thank you!



Hi Sophie, 

You might want to check the XPath Syntax. Also, please note, when using an XPath node and referencing an element node you have to add the namespace name specified in the Namespace tab of the configuration dialog ("namespace:element_node_name"). 

Please find attached a sample workflow where I extract all the links to the articles on the page as well as the link to the next page.