Selenium "Click" Node

Vanessa_030 · June 14, 2016, 10:02am

Hi Everybody,

I' m still in trouble with my workflow to analyze search results form my webpage.

The WebDriverFactory with the Start WebDriver Node works fine > so the following page is open

https://recruiting.bmwgroup.de/ibs/Servlets/ibs/controller/sm?event=__activate_and_reset&target=smerweitertesuche&sprache=de

Since I want to have ALL results from the Job portal I just need to click on the "Search" Button.

I tought I just need to use the "Click" Node, but then the error "No suitable column for ws.palladian.nodes.selenium.tyes.webElementValue found" appears.

What am I doing wrong now?

Thank you so much for you help!

Vanessa

qqilihq · June 14, 2016, 10:47am

Hey Vanessa,

you'll need to connect a "Find Elements" node first, where you get the "Find" button on which you want to click. After the "Find Elements" node you can select the "Click" node.

Hope that helps,
Philipp

Vanessa_030 · June 14, 2016, 11:18am

Oh I think I'm so bad at this. :D Can you also help me with the configuration of the "Find Elements" node?

What do I need to select or "Find by" and the Query is in my case just "Suchen" ?

Thank you.

Vanessa

qqilihq · June 14, 2016, 11:52am

You can find by "Name" and use the query "btnSearch" (just use your browser's "inspect element" functionality).

node description

Philipp

bildschirmfoto_2016-06-14_um_11.50.20.png

Vanessa_030 · June 14, 2016, 12:55pm

Yeah, so cool! It works - thank you Philipp. :)

Vanessa

Vanessa_030 · June 14, 2016, 1:43pm

It's me again. :D

Now I want to collect all links to the singular jobs. But to have all items I need to go to the next pages by clicking the ">" to show the next page of results.

How can I realize this?

I triedthe "extract attribute" Node with the attribute //a to select all links but then the error "Element not found in the cache - perhaps the page has changed since it was looked up" occurs.

Thank you again.

qqilihq · June 14, 2016, 6:29pm

Hey Vanessa,

for collecting results from paged lists, you can use a loop using loop start and end notes and then iterate through the single pages and collect your results, then click on the "Next" link. The end node will then output all collected results. For an example about the general idea, have a look at the Facebook workflow on the examples page.

Important is, not to click on any elements which are not present on the page anly longer (i.e. after a page reload). So perform the "Find elements" in each loop iteration and not outside of the loop.

Best,
Philipp

Vanessa_030 · June 15, 2016, 10:51am

Hi, I've checked the Facebook Workflow. It definitely helped me for unerstanding the general idea.

I'm still a little bit in trouble with selecting the right query.

So before I'm starting with the Loop I want to select all job links on the first page. I thought it is something with "contentunder" but it is not working.

Can someone help me?

thank you in advance

Vanessa

Vanessa_030 · June 15, 2016, 11:48am

ok, with "contenunder" it works, but there are always 2 links for one job item (the ID and the Name are links to th job site). How can I choose one of these links?

And how can I extract the singluar jobtitle to the links.

Can someone help me?

Thank you!

Vanessa

qqilihq · June 15, 2016, 11:29pm

Hi there,

in case you cannot drill down using the element itself, because the query would be too generic (in that case the a.contentunder), try to expand your query to take the parent elements into account. In your specific case something like this (CSS selector) will do the trick:

table.gitter tr[valign="top"] td:nth-child(2) a

Explanation: Selects links (a) which are contained in a td element which is the second child (i.e. second column) of a table row (tr) within a table with the class "gitter".

In general, the same can be achieved with an XPath expression using a different syntax, depending on your personal taste :)

Cheers,
Philipp

kushS322 · December 11, 2018, 9:52am

Hi Philipp,

Thanks for all your earlier help on various questions.
I’m new to KNIME and haven’t figured out some things quite yet.

I have developed a webscraping routine on KNIME which needs to traverse through multiple pages (each page has multiple catalogue elements)
The issue I’m having is -
When the loop runs for a single page, it gathers information for 120 elements, after which if I find the “next page” button and use the “click” node, it appends the next page button’s xpath to each row, after which the click button is pressed 120 times.
I tried splitting one row away from the main workflow into the page switcher, but I cant join it back to restart the loop of the pages as KNIME assigns missing values in even outer joins if there is no match.

Is there a way to execute this properly? (only click once irrespective of number of rows in the table)

Cheers,
Kush

qqilihq · December 11, 2018, 10:17am

Hi Kush,

I’d recommend to create two branches, one for extracting your results, and a second one for just extracting the pagination. You can then first extract all the necessary information, and afterwards perform your click. This requires some synchronization, to make sure the click is just performed after all your results have been extracted.

Here’s an example how this might look (this is an older workflow, so don’t be puzzled that the Find Elements node is deprecated here):

The key here is the Synchronize node:

Pro tipp: You can also avoid the Synchronize nodes alltogether and do your temporal synchronization via flow variables. This reduces clutter in your workflow with Synchronize nodes. I don’t currently have an example at hand, unfortunately.

Hope that helps – in case of further questions, please let me know.

– Philipp

kushS322 · December 11, 2018, 10:25am

Thanks Philipp, this is exactly what I was looking for! - just wasn’t aware of the synchronize node yet.
In the meanwhile I had created a makeshift workflow using a wait timer and an un-linked branch, which also works fine but is unreliable. Now I can switch to something similar.

makeshift

system · April 21, 2023, 9:39pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.