Extract compiled Web Page instead of html source

RE:  Node "GET Resource (batch)"

I've been searching for a while so now I'm asking.

I've used the Get Resource (batch) node to access custom .php scripts that generate plain text --NOT html.  This makes using the node very easy and straightforward because I can parse as a CSV to get the info I need.

Now, I need to use a Javascript to do something else(one that is very unique and would take hours to transcode). When I use a Javascript in php, it's basically echoing the the script and updating an element on the DOM level using the innerHTML function.  This means my HTML source code NEVER shows the values I need to extract even though they are in the final web page output which is only 1 string.

Can the GET Resource node (or any other node) extract text from the finished web page as opposed to the source code.  It seems like all the nodes I've tried (GET Resource (batch), HttpRetriever, HTMLParser, XPath, HTTPResultDataExtractor, Read REST...) all extract HTML/XML source code.

I want to extract the text from the finished web page.  Is this possible?  Just to be clear, the data I need will NEVER appear in the HTML/XML source code, but only in the finished web page.  Hopefully it's just a node configuration issue.

I've looked at AJAX and JSON, but it seems like I will still have anything other than plain text in the source code.

Thanks,David
 

Hi David,

this will not work with the nodes you've tried, as they simply retrieve a static page without executing any browser-side logic (think of a curl).

You can achieve what you're looking for using the Selenium Nodes, which allow you to drive a real web browser (or a headless PhantomJS) and perform any kind of action which you would do as a user. Some example workflows are available on the website.

Feel free to get in touch for any questions.

Best,
Philipp

Disclaimer: I'm the author of the Selenium und Palladian nodes.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.