Guardian API scraper

This particular wrapped metanode downloads content from the Guardian website in JSON. The configuration dialog of the metanode allows you to add search word query, set the from and to date, etc. Don’t forget to sign up and get an API key (mine is accidentally included, thanks @RolandBurger!)

Graun_WEB_API.knwf (31.2 KB)

5 Likes

That’s really cool, thanks for sharing!

2 Likes

Disappointingly, the Guardian API is limited to items (articles) without content, just metadata. It renders the API pretty much useless for data analysis. You may ask for a commercial license but I suspect that won’t come cheap. If only they had an ‘in between’ tier for research… The other UK newspapers seem to work the same way: headline and metadata only (with the exception of the Daily Express, they don’t have any API but then the Express is barely even a newspaper, it’s more like a demented drunk man’s ramblings printed on paper). If anyone can recommend a truly open news or media API of UK origin, please share it!

Hi @alkopop79,

Doesn’t the Guardian API also return the URL of each article as part of the metadata? You could use the URLs to crawl the articles individually, the Palladian for KNIME extension should have everything you need for that, particularly the ContentExtractor which detects the actual article text.

Hope that helps!

Cheers,
Roland

Will give Palladian a try, cheers. I just don’t understand why is it called ‘Open Platform’ when it’s everything but open? It’s not even available for non-commercial purposes… :rage:

@alkopop79 do you mean KNIME with this?
The KNIME Analytics Platform is free and can be used free of charge for all purposes.