How to get content of a webpage that requires authentication

I am trying httpretriver to download the content from a webpage, but what I got is the HTML which asks for login. How should I do to parse my credential and get the correct content, not the login requirement? means skip the login page.

Best regards

Two options:

(1) Use the HttpRetriever to perform the login request, store the cookie data and pass it to another HttpRetriever which grabs the data. You’ll probably need to debug within you web browser how the login process works in detail and which data you have to submit in which format.

(2) Use the Selenium Nodes, which will most probably be the more convenient way: https://seleniumnodes.com

1 Like

could you help me with this website to how to parse my user name and password in knime node?
https://www.globalnorm.net/gn/doc.php?name=ASTM%20F%202638:2012-00

i tired with POST Rest https://www.globalnorm.net/gn/login.php?USR=moki@test.com&PAS=7PE6NzPdG&doc=ASTM%20F%202638:2012-00 but in HTML Parser i notice
that <input id="USER" name="USR" type="text" value="" > <input id="PASS" name="PAS" type="password" value="">still empty and <input id="doc" name="doc" type="hidden" value="ASTM F 2638:2012-00"> is filled

this this result from HtmlParser

I would like to know how can i set the username and password ?

I don’t have much time to analyze this in detail, but here’s a quick outline:

The data will be submitted as form-encoded POST request to the login.php, which then returns a result with a cookie (you’ll get this in the lower output port of the HttpRetriever). You’ll need to serialize all fields in the form and send them with the request.

You can use the cookie information in subsequent requests by passing it in the second input port of the HttpRetriever.

As already stated, it’s tedious to do this with the ‘old fashioned’ HttpRetriever and HtmlParser. A Selenium WF would be easier.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.