I’ve made a small demonstration workflow which shows that when the cookies are retrieved dynamically, the flow is not successful in requesting the webpage, and responds with status code 500. However, when the cookie is hardcoded (copy & pasted from the web console) it works perfectly and receives status code 200 in response.
The cookies are identical so I don’t understand how/why the HTTP retriever node can differentiate between the two, because I certainly can’t.
I’m wondering is this a limitation of the HTTP retreiver node? Or some kind of magic cookie?
I am currently admittedly not 100% sure, why we do not parse this one. Most probably the date format is not following some RFC spec? On the long run, we should definitely make the parsing more lenient, I think – I’ll keep a note, but cannot promise a quick turnaround.
In the meantime, a workaround would be to parse this data manually, by taking it from the HTTP headers. You can get them with the HTTP Result Data Extractor:
Hope this helps for now.
–Philipp
[edit] Format looks fine according to RFC. Currently not sure why this happens?!
@Nancyjay I was able to fix this in the Palladian library. If you’d like to give a pre-release version a test drive during the next days, please get in touch with me at mail@palladian.ws
OK because of your references to the changelog I assumed its already there. Made a workaround with a java snipped but would be great to have these nodes working again. Based on your gitlab code you are going to switch to the HttpClientBuilder right ?
Sorry to jump-in … I had so far totally avoided sending cookies with HTTP retriever so far
(worst case I was downloading web pages locally with wget and then parising them locally) but now I hit
a wall.
Would opening a browser cookie file in knime and pass it to a http retriever work?
Any reason why to cut and paste cookie data (lots of them in my case) from the debug console of the browser?
the HTTP Retriever expects a specific format for the cookie table (originally it’s intended, that it’s provided by another HTTP Retriever node). However, if you’re able to replicate the expected structure exactly, it should work!
If it helps, here’s an example:
If you would like to import this cookie data automatically (e.g. from a browser-supplied file), you’d have to come up with a conversion workflow which parses to this format.
Or you could probably get the from the browser’s console by executing document.cookie and then parsing this result? When running this for knime.com in anonymous mode, I get something such as (note this will get you only cookies, which are not marked as “HTTP only”):