Using knime to perform outbound link analysis

Hey guys and gals,


Just wondering if anyone has some thoughts about if/how KNIME can be used to perform outbound link analysis.


the only info I need is:


-total number of outbound links from a subdomain

-number (or %) of external links

-number (or %) of internal links

-visualize data


I can do this process manually, but I have no idea how to go about automating it in KNIME.


Any thoughts would be greatly appreciated!

Does your subdomain have a sitemap assigned to it? If so, you can use the MMI Sitemap reader node, which will translate all your pages to URLs, then use the Clean HTML Retriever to scrape the page's HTML and turn it into XML. Then use XPATH to identify all the links and get their URLs ( //a/href() ). Then you can use a rule-based filter to filter in/out the internal domains into two tables, then you can run counts, then combine back together by page and you can calculate your percentages using a math node. 

Let me know if that makes sense!