HttpRetriever Node Error 401

knime_newbie_1 · October 18, 2018, 3:16pm

Hi,

I have come across following problem:

I have a table of urls like this:

hello.com/sites&pagenum=1
hello.com/sites&pagenum=2
…

When I now connect it (choose the url input in the settings and run it) to the HttpRetriever to get all the content of all urls, I get a HTTP status 401.

I dont get this because when I copy one of the urls in my browser it works perfectly fine and I get no error.

Do I have to change the urls?

Regards.

ScottF · October 18, 2018, 3:22pm

Hello @knime_newbie_1 -

Have you made sure that your input table of URLs contains http:// formatting? Here’s a simple output example of what I see when trying different inputs:

2018-10-18%2010_22_41-KNIME%20Analytics%20Platform

knime_newbie_1 · October 18, 2018, 3:30pm

Yea, what I found it is the following. I just random picked a website for this example.

If I add some parameters it leads to an error 401.

knime_newbie_1 · October 18, 2018, 3:55pm

I found out the problem.

http://www.ideastorm.com/idea2ExploreMore?v=1538483000186&Type=AllIdeas&pagenum=2#comments

The #comments part of the url seems to be the problem. I dont know why but that part causes an error 401.

ScottF · October 18, 2018, 3:56pm

You might try the GET Request node instead - it seems to work. Maybe @qqilihq can shed some light on what’s going on with the #comment parameter in the HttpRetriever node.

qqilihq · October 20, 2018, 2:15pm

Thanks for the feedback.

We should handle this properly internally (which means, stripping away the #anchor part of the URL before performing the request). I’ll try to supply a fix in the future.

In the meantime, either removing everything behind and including the # will work (the request remains the same, i.e. it’s not transmitted with a browser anyways), or using the GET node will work.

umutcankurt · February 8, 2019, 10:15am

A simple example solution. But 100% solution for all Selenium node;)