I know how to open a webpage like http://crispr-congress.com/ with the "Start WebDriver" node and know how to get the source code via "Page Source" node.
But how do I open the subpage like http://crispr-congress.com/about/speakers/ based on the known above-mentioned URL?
The thing I do not know is how is open subpages of known URLs of conferences that have something to do with the speakers of a conference. I expect that the URLs of such subpages include the term "speaker" in an URL mentioned in the source code of the main conference webpage. The original URL were extracted via a Google search with Selenium nodes.
The background of my question: I want to extract speakers from conference webpages based on an existing list of author names.
the usual workflow for navigating and interacting is like this:
Open a start URL (using Start WebDriver or Navigate node)
Extract an element (in your case a link) using the "Find Elements" node
Perform interaction; in your case this would be a "Click" node
For Step 2 you need to specify, how to locate the element to retrieve. For your scenario, you can go with an XPath expression such as (this will select links which contain 'speaker' in their target URL):
//a[contains(@href, 'speaker')]
However, using a "Click" node will not work for that specific page, as the link is hidden in a submenu, which is only visible when hovering with the mouse cursor (you will receive a "Element is not currently visible" exception). Instead, you can extract the actual href attribute value using an "Extract Attribute" node and then input the target via flow variable into a following "Navigate" node. The resulting workflow looks like this:
I attached my workflow to this post. It is an intersting use case for your selenium nodes.The idea is to find conferences about the CRISPR-Cas technology (gene editing) with a high number of speakers that belong to the top scientists in this area.
This workflow uses your selenium nodes as well as the text processing plugin. The list with the top scientist was generated in a separate workflow - in this example I show how this works in principle with 4 top scientists.
(The only thing that I do not like are the last three loops that I use. I use these because I browse to the main pages of the conferences and then try to find subpages with the term "speaker" if available. If I only use one loop and such a subpage is not available, I will also loose the main page in this loop. Maybe there is a better solution?)