Selenium Get Cookies

How do you get cookies from the Selenium nodes?

My crawling requirements are now so complicated that I believe that I need the Selenium nodes to compliment the Palladium nodes. But please bear with me as I am a complete newbie to Selenium and Selenium nodes.

I am trying to use Selenium to extract the cookies I receive from a site and push them into the bottom cookies port of an HttpRetriever node. The cookies are being generated via a JavaScript process that is too complicated for the HttpRetriever nodes alone to follow.

The Selenium documentation says:

http://www.seleniumeasy.com/selenium-tutorials/how-to-handle-cookies-in-selenium-webdriver

Selenium webdriver can perform required task with respect to browser cookies. We add , delete, delete particular cookie by passing the name, and so on.

Get Cookies
Method Name: getCookies()
Syntax: driver.manage().getCookies();
Purpose: Get all the cookies for the current domain. This is the equivalent of calling "document.cookie" and parsing the result.
Returns: A Set of cookies for the current domain.

But the nodes that connect downstream of the Selenium "Start WebDriver" node only seem to be capable of extracting Elements and CSS.

I tried writing a JSON script and connecting it directly to the "WebDriver Factory" but I got the error "Configre failed (RuntimeException): Step type 'getCookies' is not implemented".

I also found this useful-looking piece of code that might be runnable in the Selenium "Execute JavaScript" node if only I knew how to connect all the dots:

function getCookie(e){
	var t=doc.cookie.match(
		new RegExp("(?:^|;)\\s*"+e+"=([^;]+)")
	);
	return t?t[1]:""
}

 

Hi Edlueze,

a dedicated "Cookie" node will be available in a future update which is coming soon.

In the meantime, you can use the JavaScript node for extracting cookie data. Simply use the following JS code and return the cookies as string:

return document.cookie;

You can then split the string into pieces in your KNIME workflow. However, note that this will only give you access to cookies, which can be accessed using JavaScript. There is an optional security mechanism which will prevent reading cookies via JavaScript (see here). Accessing these kind of cookies is currently not supported.

Anyway, hope this helps,
Philipp

Thanks! I just needed a little push in the right direction. This solution seems to be all I need ... at least for the moment.

I didn't know about the HttpOnly flag - thanks for the heads up - it look ominous! When you say "accessing these kind of cookies is currently not supported" do you mean that Selenium Nodes doesn't support it, or the Selenium framework doesn't support it, or the underlying PhantomJS browser doesn't support it? Which layer prevents access to these cookies? Is it something you can ultimately provide via Selenium Nodes?

Managing cookies are a huge deal for me so I'm looking forward to seeing the dedicated Cookie node.

When you say "accessing these kind of cookies is currently not supported" do you mean that Selenium Nodes doesn't support it, or the Selenium framework doesn't support it, or the underlying PhantomJS browser doesn't support it? Which layer prevents access to these cookies? Is it something you can ultimately provide via Selenium Nodes?

This is a thing which is not supported by Selenium per se (as Selenium itself basically interacts by JavaScript with the browser).

There are potential workarounds, e.g. when using PhantomJS, the cookies file can be accessed directly without the Selenium API. This is a feature which we could eventually offer in the Selenium Nodes, but it's currently not a very short-term priority.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.