Selenium Nodes: All Loop Iterations Repeat First Page Content


#1

I’m trying to pull review data from a site and the loop keeps repeating the first-page content.
What am I missing?

See the attached workflow.

Thank you.

Howtogetallpages.knwf (42.4 KB)


#2

Works for me. What’s the issue, exactly?

Apologies: Got it, the second iteration extracts the same results as the first one :slight_smile:

I’ve had a look at the Yelp website, and the peculiar point is, that they keep some of the review information in a visually hidden section – this is intended for search engines and not shown to the user. This information will remain constant, and always show the 20 first reviews, even when you use the pagination links. (the fact that this information is within <meta> tags gives a good indicator, that it’s not meant for human consumption.)

I’d thus suggest to change the query in the Find Elements node to div.review – this will correctly address the visible <div> elements on the page.

If I have some more time later, I’ll post an updated example workflow.


#3

Thanks, Phillip.

I can use a little more help if/when you’re able to.
I can extract the div content but how do I extract sub components?


#4

The general strategy is that you use follow-up Find Elements nodes, which operate within the <div> which you have extracted before (technically speaking, the Find Elements node allows either a WebDriver or a WebElement column as input). I’m attaching a fully-working example workflow (please also check the comments):

Some general remarks:

  1. I changed the initial query in the first Find Elements as suggested above
  2. I added a Row Filter to skip the first element which is just a no-content dummy
  3. Instead of the Execute JavaScript node I used the dedicated Extract InnerHTML and Extract Attribute nodes. This is faster than the JavaScript way.
  4. I replaced the Synchronize nodes with flow variables. The result is the same as with the Synchronize nodes, but the workflow is cleaner (we still keep the Synchronize nodes mostly for didactic reasons, as it might be easier to understand for people who haven’t used FWs before :slight_smile: )

Yelp_Review_Scraping.knwf (26.6 KB)


#5

This works beautifully, thank you.


closed #6

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.