Using cookie retrieved with HTTP Retriever node doesn't work in workflow, however copy & pasting exact same cookie from web console does.

I’ve made a small demonstration workflow which shows that when the cookies are retrieved dynamically, the flow is not successful in requesting the webpage, and responds with status code 500. However, when the cookie is hardcoded (copy & pasted from the web console) it works perfectly and receives status code 200 in response.

The cookies are identical so I don’t understand how/why the HTTP retriever node can differentiate between the two, because I certainly can’t.

I’m wondering is this a limitation of the HTTP retreiver node? Or some kind of magic cookie?

Faulty_Cookie_Example.knwf (15.4 KB)

Not a magic cookie :slight_smile: But: When executing the first node, the console says:

WARN  HTTP Retriever       3:932      Invalid cookie header: "Set-Cookie: bm_sz=FFDB98FE7B4026F62FB7D54059BB1C85~YAAQ9MTdWOoSQrN0AQAAFHosxgk0V5LGBBCw1jXht7l6VImJbVOZgurF0EPxuvYWfbKeVQsQ+zHgPI2/kb24FTYEJgMKpVAoPiiwwnB5eBtuCZckRhj6TcPCn1JWS7QCA9O4+ZxTIfsrUxWtILrtmuoMyOwcwH1hMcWKpCBUYMQXwq7qmlu93cofNcp9DBjrc6C+RPvHUQ==; Domain=.danmurphys.com.au; Path=/; Expires=Fri, 25 Sep 2020 20:50:36 GMT; Max-Age=14399; HttpOnly". Invalid 'expires' attribute: Fri, 25 Sep 2020 20:50:36 GMT
WARN  HTTP Retriever       3:932      Invalid cookie header: "Set-Cookie: _abck=9BE50A311AF93904A43FC2E6968FC6A1~-1~YAAQ9MTdWOsSQrN0AQAAFXosxgSP4rer1IN6dbrVszeNJ1Y0HgVpW3Yba7PBaDlzlz8DbVXt/oByu6cWtLF/xG5xQV5j5biEtihMXnJ0AzpYzUErjda+EZ5ViOThZ4oATPPXki5YRRpWZa1JR74yfSo5Lweq3xK8WL07L9IG/QRvP7jXbCQyiYDavx60R/NbioSD2obhCh7IUaabtzplQPJbHkp1cyJExJKiNJqiWrk+pNUNot3ynai4f2g0Nzl3uGHQjDImIAcdwzLeKqcXG6lHS5VWtqH7lFVmVnL5UOT9d5jTpnul4YtdiswxNmrGeQ==~-1~-1~-1; Domain=.danmurphys.com.au; Path=/; Expires=Sat, 25 Sep 2021 16:50:37 GMT; Max-Age=31536000; Secure". Invalid 'expires' attribute: Sat, 25 Sep 2021 16:50:37 GMT

I am currently admittedly not 100% sure, why we do not parse this one. Most probably the date format is not following some RFC spec? On the long run, we should definitely make the parsing more lenient, I think – I’ll keep a note, but cannot promise a quick turnaround.

In the meantime, a workaround would be to parse this data manually, by taking it from the HTTP headers. You can get them with the HTTP Result Data Extractor:

Hope this helps for now.

–Philipp

[edit] Format looks fine according to RFC. Currently not sure why this happens?!

1 Like

I tried getting the cookie with Python, however the issue persists. V strange.

Python workflow enclosed.

Faulty_Cookie_Example.knwf (18.4 KB)

@Nancyjay I was able to fix this in the Palladian library. If you’d like to give a pre-release version a test drive during the next days, please get in touch with me at mail@palladian.ws

Thanks!
Philipp

May I ask what changes you have made to the Palladian nodes? We use them quite frequently so it would be useful to know to prepare for such changes. :slight_smile:

Also thank you for your timely response and action.

Regarding changes: You can find the official change log on the NodePit page – consider this the official channel regarding all Palladian updates:

Beside that, I’ve always tried to give a quick wrap up here in forums, e.g. here:

https://forum.knime.com/c/community-extensions/palladian-selenium/34

– Philipp

1 Like

[edit] Sorry, here’s the proper link:

1 Like

Same issue here. Updated to 2.3 on Knime 4.1 but the issue still persists.

It’s not yet fixed (except in the mentioned pre-release versions.)

–P

OK because of your references to the changelog I assumed its already there. Made a workaround with a java snipped but would be great to have these nodes working again. Based on your gitlab code you are going to switch to the HttpClientBuilder right ?

Sorry to jump-in … I had so far totally avoided sending cookies with HTTP retriever so far
(worst case I was downloading web pages locally with wget and then parising them locally) but now I hit
a wall.

Would opening a browser cookie file in knime and pass it to a http retriever work?
Any reason why to cut and paste cookie data (lots of them in my case) from the debug console of the browser?

Thanks

Ludo

Hi Ludo,

the HTTP Retriever expects a specific format for the cookie table (originally it’s intended, that it’s provided by another HTTP Retriever node). However, if you’re able to replicate the expected structure exactly, it should work!

If it helps, here’s an example:

image

If you would like to import this cookie data automatically (e.g. from a browser-supplied file), you’d have to come up with a conversion workflow which parses to this format.

Or you could probably get the from the browser’s console by executing document.cookie and then parsing this result? When running this for knime.com in anonymous mode, I get something such as (note this will get you only cookies, which are not marked as “HTTP only”):

"visitor_id876371=252052004; visitor_id876371-hash=28ca7d3eabcda3ae09d7ea1500f15e3d9039c7fac9e12fabcbc4eda81fcc5cd965622820de2eefa4de87d641bff65400213996bf; _gcl_au=1.1.1857561268.1610125257; _ga=GA1.2.379818015.1610125257; _gid=GA1.2.958724028.1610125257; _gat_UA-511689-4=1; _hjTLDTest=1; _hjid=204d69e2-a1e3-4259-b613-6c28d6217b10"

… this could easily be converted into a table with above’s structure with some KNIME string splitting workflow.

Does this help?

Best,
Philipp