FeedParser

Hello!
I have been working with the sample exercise of Text_processing, specifically with 06_NY_Times_RSS_FEED_Tag_Cloud, and I have a question.

I am trying to extract information from this page http://www.eltiempo.com/rss/otras-ciudades.xml … what I really do is change, in the first node, the NY Times link for which I want to use but for some reason that I do not know when it arrives at Feed Parse the workflow stops working.

Specifically, the following error arises
Execute failed: ws.palladian.retrieval.parser.ParserException: org.xml.sax.SAXParseException; lineNumber: 90; columnNumber: 110; The attribute name “async” associated with a type of element “script” must be followed by the character ‘=’.

I would like to know why this happens? Or if I have to follow another workflow to extract the information from the page that interests me.

Thank you, stay tuned

Manuela

Hi Manuela,

if you try to open this URL in your web browser, you’ll see a 404 error message:

http://www.eltiempo.com/rss/otras-ciudades.xml

I.e. the link to the RSS feed is (no longer?) valid. It’s still listed here in the overview, so this is presumably an error by the El Tiempo website.

Technically, the error message you see is caused by the fact, that the FeedParser node attempts to parse the HTML error page, which causes the parser error.

Best,
Philipp

Hi!
Thank you for your quick answer
I will continue looking for another source of extraction.

Manuela