HttpRetriever use on cookie-required websites

alvesfortes · April 17, 2014, 1:17pm

If I use the HttpRetriever node on websites which require cookie acceptance it only retrieves the cookie error page ("An Error Occurred Setting Your User Cookie"). How can I bypass or fix this?

Thanks

qqilihq · April 17, 2014, 2:18pm

The HttpRetriever node currently simply rejects cookies.

What's your specific use case? Do the cookies from one request need to be present in subsequent requests? It shouldn't be a big thing to accept those cookies, but I would not want to implement a persistent cookie store for the nodes.

alvesfortes · April 17, 2014, 3:49pm

Thanks for the quick reply.

If I for example want to use the HttpRetriever on this page: http://arc.aiaa.org/action/showPublications?pubType=meetingProc it will download this page: http://arc.aiaa.org/action/cookieAbsent because it doesn't accept cookies. This stops everything in it's tracks. I can understand you prefer not to implement a persistent cookie store in the node.

Do you know of any way around this?

qqilihq · April 17, 2014, 5:29pm

Great to have a test URL. Give me a few days, I'll see what I can do about the issue. I'll get back to you here.

Best,
Philipp

qqilihq · April 22, 2014, 6:21pm

I've fixed that issue, the updated node is available from tomorrow via the nightly build of the nodes.

Philipp

alvesfortes · April 25, 2014, 1:16pm

Great! It's working now.

Thanks for your speedy assistance.

Ruben · October 15, 2014, 3:48pm

Hi there!

I've used this post before to fix exactly the same problem (back in KNIME v.2.9.x) and was able to retrieve the above mentioned webpage successfully as a test. But it seems that it isn't working anymore in some cases, so I used the same webpage above again to test and for some reason it is not working :(

Any help would be greatly appreciated!

Additional information:

I use KNIME v.2.10.3
I use Palladian v.1.2.0.201408051613
I have the nightly build enabled (http://tech.knime.org/update/community-contributions/trunk)

qqilihq · October 15, 2014, 6:30pm

Hi Ruben,

I'll have a look at the issue, however this might take several days. I'll get back to you here.

Best,
Philipp

Ruben · October 16, 2014, 2:06pm

Hi Philipp. Looking forward to your reply! :)

qqilihq · October 18, 2014, 4:29pm

Hi Ruben,

I just double-checked with the given URL and it's working fine (i.e. I'm not getting any cookie-related error result, when retrieving the page). Can you give more details about your workflow, or maybe attach the workflow with which you're experiencing the issue?

Best,
Philipp

Ruben · October 20, 2014, 11:14am

Hi Philipp,

The workflow is really nothing fancy (see attached). Just a "Table Creator" for URL input, followed by "HttpRetriever".

Thanks,
Ruben.

palladian_test.zip

qqilihq · October 20, 2014, 12:41pm

Hi Ruben,

the problem is, that you're not parsing the HttpResult in the HtmlParser, but the URL, which causes another HTTP request directly through the HtmlParser (and the HtmlParser is not capable of handling cookies).

Simple solution to your problem: Open the configuration of the HtmlParser node and select the HTTP Result cell as input.

Best,
Philipp

[note to myself] I will add a warning output to future releases when supplying HTTP URLs to the parser, as this is deprecated. The only reason for accepting URLs is being able to supply file: input to the parser, but afaik KNIME supports a dedicated File column now?

Ruben · October 20, 2014, 1:01pm

Doh!

I've created a couple dozen workflows with the same basic setup (URL>HttpRetriever>HttpParsrer), but never had this problem before...

Thanks for your time and solving the mystery :)

RovhieendaG · January 22, 2015, 12:24pm

We should know how to respect one another. Federal representatives have raided the offices of the Scooter Store, a company specializing in powered scooters for disabled persons, many of which go to Medicare recipients. The company is alleged to have carried out a huge Medicare scam. Source for this article:

take a short look at https://personalmoneynetwork.com/short-term-loans/

system · April 21, 2023, 9:41pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.