Looping over pages of website

Hello everyone,
I am trying to build a webscraper using the attached workflow.
Currently, I can only get the first 10 results from the URL in the GET request node (https://www.kununu.com/middlewares/profiles/de/ebm-papst-unternehmensgruppe/6f2e08d6-d4e9-4951-a2e3-72e173d92c23/reviews?reviewType=employees). I would like to loop over all available pages on this specific URL, to concatenate the results (in my case ratings) in one table. For that I need a variable, that counts from 1 to the maximum page count, which I can then pass to the URL in the GetRequest like this: https://www.kununu.com/middlewares/profiles/de/ebm-papst-unternehmensgruppe/6f2e08d6-d4e9-4951-a2e3-72e173d92c23/reviews?reviewType=employees&page=
Can someone help me with that, please? I am new to KNIME and a little bit lost there.

Thanks in advance!
BAMP_project.knwf (17.3 KB)

Hi @JaninaPatzer
that was a nice challenge. I was not aware that Kununu has a public API.

I will attach you the workflow. Basically I increase the page nr by using the iteration flow variable and stop if I have more results than the total number of results.

Have a great day!
Crawl_Kununu.knwf (32.6 KB)

5 Likes

Just curious, have you tried how many simultaneous requests are possible?
br

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.