again I've an issue with the extraction of content of a website...
I try to extract all vacancies of one website, with the selenium nodes I was able to extract the title and the link of every vacancy (many thanks to Philipp ;-)) . My question is: How can I extract the content of each "vacancy-link" in this specific case (I was able to do this for some other websites before).
The example-workflow is as follows:
Table creater with one "vacancy-link" to test (http://www.zimmer.com/careers/search/job-details.html?id=QCVFK026203F3VBQBV7V47VNG&nPostingID=7182&nPostingTargetID=22589&mask=zimextus&lg=EN) --> HttpRetriever --> HtmlParser --> XPath... doesn't work properly
When I run the "vacancy-link" with the IE browser and click right on the content which shall be extracted (the grey field) --> then click on "inspect element" respectively "Element untersuchen" --> it shows the html?!-code --> I would say the XPath has to refer to somewhat like div id="JD-AllFields". Unfortunately I can't find this line in the configuration of the Xpath node. Also the content extractor doesn't work properly.
Many thanks to you in advance!
MANY thanks, once again!!
I'm able to extract the content when I open the vacancy-link with the "start WebDriver":
My next problem ;-) is:
I hope my English is understandable ;-)
Many thanks again!
(1) In general, do not perform any actions which change a browser's content, when you have tables with multiple rows. In that case, simulating a click works for the first row, however for the second row it will fail, because the WebElement is no longer available (as the page in the browser has changed).
I would recommend extracting all link targets (hrefs) first, and then adding a loop which performs your desired extractions step-by-step. You can either re-use one WebDriver and navigate using the Navigate node, or open and close a fresh WebDriver in each iteration.
(2) You can combine those branches using the Joiner (and a suitable join criterion), or the Cross Joiner to perform a "n x m" join.
Hope that helps,
the extraction of all link-targets is done! That works fine!
What do you mean with "fresh WebDriver"; the both nodes "WebDriver Factory" and "Start WebDriver"?
How can I deliver the link-targets to a new "fresh WebDriver" (at first step just for one link, so without a loop)? I tried the "Flow Variables" Port (see png) but that ends in an error of the Start WebDriver node (Execute failed: Factory F:\xxx not found).
Sorry for the many questions... And many thanks :-)!
just sent you an e-mail, but after looking at your screenshot again, I suspect the problem is as follows: I assume you need a "Table row to flow variable" node between the "Extract Attribute" and the "Start WebDriver" node to convert the extracted link to a variable. You can then select the variable in the "Start WebDriver" node configuration.
To perform the extraction row-wise, use a "Start chunk loop" node and the corresponding end loop node, which will run each input row in isolation.
awesome... many thanks!
It works (at first step without loop, so just for one vacancy-link) with the table to flow variable! I'll have a look at the loop-function tomorrow.