Looping over pages of website

Hello everyone,
I am trying to build a webscraper using the attached workflow.
Currently, I can only get the first 10 results from the URL in the GET request node (https://www.kununu.com/middlewares/profiles/de/ebm-papst-unternehmensgruppe/6f2e08d6-d4e9-4951-a2e3-72e173d92c23/reviews?reviewType=employees). I would like to loop over all available pages on this specific URL, to concatenate the results (in my case ratings) in one table. For that I need a variable, that counts from 1 to the maximum page count, which I can then pass to the URL in the GetRequest like this: https://www.kununu.com/middlewares/profiles/de/ebm-papst-unternehmensgruppe/6f2e08d6-d4e9-4951-a2e3-72e173d92c23/reviews?reviewType=employees&page=
Can someone help me with that, please? I am new to KNIME and a little bit lost there.

Thanks in advance!
BAMP_project.knwf (17.3 KB)

Hi @JaninaPatzer
that was a nice challenge. I was not aware that Kununu has a public API.

I will attach you the workflow. Basically I increase the page nr by using the iteration flow variable and stop if I have more results than the total number of results.

Have a great day!
Crawl_Kununu.knwf (32.6 KB)


Just curious, have you tried how many simultaneous requests are possible?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.