Current Sample Request / HttpRetriever use on cookie-required websites

Hi;
After the main link page, the main page’s html data appears when I want to retrieve data from other sub-page URLs. Is there an example workflow on how to solve this problem now?

Thank you for your help

sample url
https://ted.europa.eu/TED/search/searchResult.do?page=2

no answer? from anyone

A useful answer would be very useful. Don’t you have someone to help? in this regard

Hi @umutcankurt,

can you elaborate little more on what you want to achieve? A request of the provided URL gives me an error.

Best,
Marten

Hi;

https://ted.europa.eu/TED/browse/browseByBO.do

I want to get the data on the pages below. But I did not try different experiments.
I think it’s moving with “cookies” from the home page. when I try this method it brings me the html on the main page.

I couldn’t figure out how to do a workflow on how to get the page data.


KNIME_TEST_ted_project.knwf (32.4 KB)

The trial workflow I can’t get results from. Something’s wrong, but I can’t find it.

Hi; @Marten_Pfannenschmidt

I have a similar problem with this url. I would be very pleased to solve the problem if I could get help and support immediately. I’m stuck here.

Cookie …I have to solve the problem, waiting for the support of everyone to help. thanks

sample two
Home page
https://irl.eu-supply.com/ctm/supplier/publictenders

Parse page
https://irl.eu-supply.com/ctm/supplier/1
https://irl.eu-supply.com/ctm/supplier/2
https://irl.eu-supply.com/ctm/supplier/3
.
.
.
.

Hi;@qqilihq
Do you have a solution as a developer? or what is your feedback?

Hi umutcankurt,

I looked at above’s example. As it’s pulling in data via JS/AJAX/XHR, there’s no easy way to to use GET Request or HttpRetriever, instead you’ll need a full browser as provided via the Selenium Nodes. Please see this reply for an explanation:

Simply way to detect this:

Disable JS in your web browser and try loading the page. If the desired content does not show up, you’ll need “real” web browser as provided e.g. via Selenium Nodes.

– Philipp

1 Like

thanks for the answer. I’m thinking of buying Palladian nodes, but I have a question mark on my head. I think that the process of getting this data will be much extended because it will open a web page which will have to scan multiple pages.
Do you think it is possible to serialize it with palladian nodes when I want to scan too many web pages (opening the browser / working in the background)?

Hi there,

to avoid confusions:

  • the Palladian nodes are free (for use in free KNIME versions)
  • the Selenium nodes are paid

In case you’re wondering whether the Selenium Nodes are the right tool for your task, I invite you to give the free 30-day trial a go.

From my experience:

I’ve used the Selenium Nodes several times to crawl high amounts of pages. Of course, there is a larger performance overhead compared to a pure “download page” approach like with Palladian, but you can often optimize/parallelize/etc. Still, your throughput will always be slower with the Selenium Node, as these are using a real Web browser. But often, that’s the only way to access current web pages resp. web apps.

My suggestion: Try out the trial version and see whether it works for your problems. Feel free to get back if you need any advice regarding optimization.

Best,
Philipp

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.