Internet nodes don't connect behind proxy

Internet nodes don't connect behind proxy

I am running the KNIME client on a Windows 7 PC behind my company's firewall, which also has an internet proxy that requires authentication.
In the internet options control panel on my PC, an automatic configuration script specified that lists the proxies and ports. Connecting to sites from IE or Chrome requires no addtional effort on my part, meaning I am never prompted for authentication by the proxy.

I have installed the KNIME client on my workstation. 
I am attempting to use the Download node and the HttpRetriever nodes to access an external website.

Both of these nodes fail irrespective of the Proxy Configuration I have tried.

Proxy Config #1
Preferences > General > Network Connections > Active Provider=Native
Shows a single entry as: 
Checked Schema=HTTP, Host=Dynamic, Port=Dynamic, Provider=Native, Auth=false
I cannot edit this entry, nor can I uncheck it. 

Test Results for Proxy Config #1
Executing the Download node results in "ERROR Download  0:24  Execute failed: No route to host: connect"

Executing the HttpRetriever node results in "WARN  HttpRetriever  0:12  Error retrieving https://www.fda.gov/RegulatoryInformation/Guidances/default.htm: Exception java.net.NoRouteToHostException: No route to host: connect for URL "https://www.fda.gov/RegulatoryInformation/Guidances/default.htm": No route to host: connect"

Proxy Config #2
Preferences > General > Network Connections > Active Provider=Manual
Establish entries as follows:
Checked Schema=HTTP,  Host=myproxy.company.com, Port=1234, Provider=Manual, Auth=true, User=domain\username, Password=xxxxx
Checked Schema=HTTPS, Host=myproxy.company.com, Port=1234, Provider=Manual, Auth=true, User=domain\username, Password=xxxxx
Checked Schema=SOCKS [Cleared]

Proxy bypass entries are:
Checked Host=localhost, Provider=Manual
Checked Host=127.0.0.1, Provider=Manual

Test Results for Proxy Config #2
Executing the Download node results in "ERROR Download  0:24  Execute failed: No route to host: connect"

Executing the HttpRetriever node results in "WARN  HttpRetriever  0:12  Error retrieving https://www.fda.gov/RegulatoryInformation/Guidances/default.htm: Exception java.net.NoRouteToHostException: No route to host: connect for URL "https://www.fda.gov/RegulatoryInformation/Guidances/default.htm": No route to host: connect"

One of the interesting things about Proxy Config #2 is the error log displays informational messages:
System property https.proxyPort is not set but should be 1234.
System property https.proxyHost is not set but should be myproxy.company.com.
System property http.proxyPort is not set but should be 1234.
System property http.proxyHost is not set but should be myproxy.company.com.

Also, restarting the KNIME client between Proxy preference changes seems to have no effect.

Proxy Config #3
Preferences > General > Network Connections > Active Provider=Direct
All Proxy entries are unchecked
All Proxy bypass entries are unchecked

Test Results for Proxy Config #3
Executing the Download node results in "ERROR Download  0:24  Execute failed: No route to host: connect"

Executing the HttpRetriever node results in "WARN  HttpRetriever  0:12  Error retrieving https://www.fda.gov/RegulatoryInformation/Guidances/default.htm: Exception java.net.NoRouteToHostException: No route to host: connect for URL "https://www.fda.gov/RegulatoryInformation/Guidances/default.htm": No route to host: connect"

There is one circumstance where I can get these nodes to work-- if I connect the PC to the Internet outside of my company's network. The exact same workflow containing these nodes works without error.

So I have some questions:
1) Do these nodes and other internet-function nodes in KNIME work behind Proxies with Authentication?

2) If so, what is the correct combination of settings to make this work? Should I try something else?

Much Thanks,
kroembke

Hi Kroembke,

I think the HttpRetriever node does unfortunately not make use of the defined proxy settings.

Cheers, Kilian

I cannot be a big help concerning the actual proxy configuration, but …:

(1) The HttpRetriever node picks up Eclipse's (resp. KNIME's) proxy configuration, no need for additional settings.

(2) I know of several users who use the Palladian HttpRetriever successfully behind an authenticated proxy server.

As the mentioned KNIME-integrated "Download" node also fails, this looks like a configuration issue to me.

-- Philipp

[edit] If you turn on DEBUG logging, there should be some additional information about the proxy servers as actually used by the HttpRetriever, which might help in localizing the issue.