Scraping reference cites from google scholar with selenium


I’m trying to scrape reference cites from google scholar using selenium. I tried to make a loop to click multiple popup boxes on the same page which the cites are. I’ve also tried to look for the direct links of the pop up elements to make the loop. But I couldn’t make it with these two approaches. If you know about a different one or maybe in some way make these approaches works I would really appreciate it.

Look at the workflow pictures I attached here.
Thanks in advance.

Hi @HeidelMS and welcome to the KNIME community forum,

Here is an example to extract all BitbTex citations in a page using Selenium nodes:

selenium_cite.knwf (581.9 KB)

2 points:

  1. You cannot pass several elements to click (webdriver tries to click them all). I used a Row Filter to keep one row regarding the current iteration number.

  2. The Find Elements node which feeds the click must exist in each loop iteration (so put it after the loop start node)



Thank you very much!! It works perfectly. I started to use Knime two months ago and I was stuck with this problem for two weeks. I also had a problem with the loop for pagination but I just changed some nodes to other places and It worked also.



So you should have visited KNIME Forum sooner. :wink:

1 Like