LocationExtractor failed execution: target server failed to respond

Hi,

I'm trying to "clean" an address column in my database using LocationExtractor.

On using the LocationExtractor node, I keep encountering the error (after a few minutes of executing a few hundred rows):

 

Execute failed: ws.palladian.retrieval.HttpException: Exception org.apache.http.NoHttpResponseException: The target server failed to respond for URL "http://api.geonames.org/search?name_equals=<search string>=FULL&username=<username>": The target server failed to respond

 

I've sorted my workflow to run only 2,000 inquiries per execution so that the process will not exceed the hourly limit of the server.

If i understood the error message correctly, it seems that the web service lagged on one of the queries hence the failure to respond.

Has anyone found a workaround for this? I'm trying to look for patterns on the instances that the errors occured but could not find any. The error is repeatable but the rows on which the errors occur are random. Hence, problem could be server-side.

I'm thinking of prolonging the "timeout" so the node will not fail even if the target server lags, but I don't know how to do it, or if it's even possible in knime.

I haven't explored all nodes yet, but there must be a way to make a continuous workflow around this even with errors/server lags.

Your help will be greatly appreciated.

Hi there,

I assume you have enabled this option:

DB Table Selector node

As it states, this causes an additional request for every found location candidate to retrieve additional hierarchy data (i.e. Geonames finds 50 potential matches for the term "New York", there will be 51 requests in total). In fact, this means that you'll reach the limit of 2,000 locations very quickly.

You can uncheck this option, but this comes with a great sacrifice in accurracy/recall,(especially if you're extracting smaller/less known locations) as the LocationExtractor's algorithms are optimized on the hierarchy data.

We offer a commercial plugin which allows local deployment of the location database to avoid this issue (and provides higher reliability and a great speed improvement). Let me know if that's interesting.

-- Philipp

[edit] Looking at your error message again, I'm not completely sure whether the problem is in fact caused by exceeding the quota. The exception "org.apache.http.NoHttpResponseException" suggests that the server was either down, overloaded or your connection had a drop out?

Thanks for the reply.

I have not enabled the option. Also, I should have received a different and more explicit error message for exceeding quota, which is not the case.

My quest for the right (non-commercial) solution continues.....

I'm trying the GoogleAddressGeocoder but I'm met with this error:

Execute failed: ws.palladian.extraction.location.geocoder.GeocoderException: Received status code OVER_QUERY_LIMIT

I'm pretty certain that I have not reached the daily quota (because I can still re-execute the node without the limit error). The error is most likely because I'm sending requests too fast.

Is there a way to throttle the requests made through this node?

Thanks for your help Philipp.

Hi there,

I've fixed a similar issues last year by adding a throttle to the Google node:
https://tech.knime.org/forum/palladian-selenium/googleaddress-geocoder-nodes

Can you please let me know the Palladian version you're using? It's visible when you turn on DEBUG logging (prefs -> KNIME -> KNIME GUI -> DEBUG) and restart KNIME. Among the first lines there sould be an entry like:

DEBUG PalladianPluginActivator            Palladian version 0.6.0-SNAPSHOT (build 2016-10-31 15:26:22)
Copyright 2009-2016 by David Urbansky, Philipp Katz, Klemens Muthmann

Currently, the node has an internal limit which limits to 5 requests/second max. The official Google docs say that 50 requests/second are allowed.

Alternatively, you may want to give the MapQuestGeocoder or the MapzenGeocoder a try.

-- Philipp

MapZen actually worked pretty good; without a single error, and more accurate than MapQuest. Thanks for these great nodes Philipp and team. :)

Thanks for the kind words, have fun!

PS: In case you could tell me your exact Palladian version that would be very helpful. I have a feeling the mentioned fix is currently not available in the regular branch and would really like to verify/fix this. Thanks! :)

Palladian version 0.6.0-SNAPSHOT (build 2016-10-31 15:26:22)

Copyright 2009-2016 by David Urbansky, Philipp Katz, Klemens Muthmann

Thx!