HttpRetriever use on cookie-required websites

If I use the HttpRetriever node on websites which require cookie acceptance it only retrieves the cookie error page ("An Error Occurred Setting Your User Cookie"). How can I bypass or fix this?

Thanks

The HttpRetriever node currently simply rejects cookies.

What's your specific use case? Do the cookies from one request need to be present in subsequent requests? It shouldn't be a big thing to accept those cookies, but I would not want to implement a persistent cookie store for the nodes.

Thanks for the quick reply.

If I for example want to use the HttpRetriever on this page: http://arc.aiaa.org/action/showPublications?pubType=meetingProc it will download this page: http://arc.aiaa.org/action/cookieAbsent because it doesn't accept cookies. This stops everything in it's tracks. I can understand you prefer not to implement a persistent cookie store in the node.

Do you know of any way around this?

Great to have a test URL. Give me a few days, I'll see what I can do about the issue. I'll get back to you here.

Best,
Philipp

I've fixed that issue, the updated node is available from tomorrow via the nightly build of the nodes.

Philipp

Great! It's working now.

Thanks for your speedy assistance.

Hi there!

I've used this post before to fix exactly the same problem (back in KNIME v.2.9.x) and was able to retrieve the above mentioned webpage successfully as a test. But it seems that it isn't working anymore in some cases, so I used the same webpage above again to test and for some reason it is not working :(

Any help would be greatly appreciated!

 

Additional information:

  • I use KNIME v.2.10.3
  • I use Palladian v.1.2.0.201408051613
  • I have the nightly build enabled (http://tech.knime.org/update/community-contributions/trunk)

Hi Ruben,

I'll have a look at the issue, however this might take several days. I'll get back to you here.

Best,
Philipp

Hi Philipp. Looking forward to your reply! :)

Hi Ruben,

I just double-checked with the given URL and it's working fine (i.e. I'm not getting any cookie-related error result, when retrieving the page). Can you give more details about your workflow, or maybe attach the workflow with which you're experiencing the issue?

Best,
Philipp

Hi Philipp,

The workflow is really nothing fancy (see attached). Just a "Table Creator" for URL input, followed by "HttpRetriever".

Thanks,
Ruben.

Hi Ruben,

the problem is, that you're not parsing the HttpResult in the HtmlParser, but the URL, which causes another HTTP request directly through the HtmlParser (and the HtmlParser is not capable of handling cookies).

Simple solution to your problem: Open the configuration of the HtmlParser node and select the HTTP Result cell as input.

Best,
Philipp

[note to myself] I will add a warning output to future releases when supplying HTTP URLs to the parser, as this is deprecated. The only reason for accepting URLs is being able to supply file: input to the parser, but afaik KNIME supports a dedicated File column now?

Doh!

I've created a couple dozen workflows with the same basic setup (URL>HttpRetriever>HttpParsrer), but never had this problem before...

Thanks for your time and solving the mystery :)

We should know how to respect one another. Federal representatives have raided the offices of the Scooter Store, a company specializing in powered scooters for disabled persons, many of which go to Medicare recipients. The company is alleged to have carried out a huge Medicare scam. Source for this article:

take a short look at https://personalmoneynetwork.com/short-term-loans/

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.