Example - WebCrawler Workflow

kichenin · November 26, 2014, 1:44pm

Hi All,

I am a new user and starting to learn knime. While trying out the examples, I could not find the "webcrawler workflow" mentioned in the "Knowledge Extraction from a Web Forum" white paper. Can anyone help me in finding the source code for the workflow to crawl data from a forum.. I am trying to build a similar functionality as a learning activity.

Thanks & Regards

kichenin

kilian.thiel · November 27, 2014, 9:50am

Hi kichenin,

the workflows are available on the KNIME Example Server under 050_Applications/050007_ForumAnalysis. The workflows are in the workflows folder. See https://www.knime.org/example-workflows for a description how to connect to the example server. The crawling workflow requires the Palladian (http://tech.knime.org/community/palladian) and the XML extension.

Furthermore find a small crawling example workflow attached. The workflow loads the content of the science page of the New York Times website (http://www.nytimes.com/pages/science/index.html) and extracts titles, links, authors, and summaries of all articles on that page. This workflows requires the Palladian and XML extension as well.

Cheers, Kilian

webdataextraction.zip

kichenin · November 27, 2014, 11:02am

Hi Kilian,

Thanks for the reply. Actually I was looking for the KNIME forum crwaler workflow refered in the forum analysis white paper. The example workflow does not contain that paricular workflow. But I will try to use the Palladian example provided by you as a basis and build my own crawler.. Thanks again.

Regards

Kichenin

system · June 2, 2023, 9:49pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.