For WebSearcher of Palladian no preference [API, identifier] can be set for Google or DuckDuckGo. Both search engines do not work in the WebSearcher node [the other search engines work].
For Google I have set the API/identifier for Google Search, but to no avail. For DuckDuckGo I have no idea whether or not an API is required, all I can see that the node does not work for it, and that no API can be set.
DuckDuckGo does not require an API key and should therefore work without configuration (just checked on my installation and it works fine). Can you give me more details, what is not working? Do you get any error messages in the console log? What was your query, how many queriy terms did you have (= rows in input column) and how many results did you request (= setting in node)?
The Google configuration in the preferences (Google Custom Search API key, Google Custom Search API identifier) are only required for the Google Custom Search. The others (Google, Google Blogs, Google Images, Google News, Google Plus) work without API key. Same here, if you encounter any problems, please describe them in detail so that we can sort them out.
when i set up a table creator with keywords 'knime AND palladian', both websearcher nodes [duckduckgo and google] work.
when i add a second word on a second line [e.g. 'weka'] then both nodes fail.
when i remove the second line, duckduckgo runs and google fails.
i have experimented with different search terms and nr of lines, and it is not really possible to replicate the problem.
here are the console messages:
ERROR WebSearcher Execute failed: Parse error while searching for "weka" with DuckDuckGo (request URL: "http://duckduckgo.com/d.js?l=us-en&p=1&s=20&q=weka", result String: "var q=window.location.href.indexOf('?q');if (q!=-1) q=window.location.href.replace(/^[^\?]+\?q=\??/,''); else {q=window.location.href.replace(/^http:\/\/[^\/]+\/?/,'');q=q.replace(/\_/g,' ');};q=q.replace(/\&.*$/,'');var dnd0=[{"c":"http://www.google.com/search?q="+q,"u":"http://www.google.com/search?q="+q,"a":"","d":"google.com search","t":"EOF","i":"www.google.com"}];if (nrn) nrn('d',dnd0);")
ERROR WebSearcher Execute failed: Exception parsing the JSON response while searching for "weka" with Google: JSONObject["responseData"] is not a JSONObject., JSON was: "{"responseData": null, "responseDetails": "Suspected Terms of Service Abuse. Please see http://code.google.com/apis/errors", "responseStatus": 403}"
ERROR WebSearcher Execute failed: Exception parsing the JSON response while searching for "knime AND palladian" with Google: JSONObject["responseData"] is not a JSONObject., JSON was: "{"responseData": null, "responseDetails": "Suspected Terms of Service Abuse. Please see http://code.google.com/apis/errors", "responseStatus": 403}"
hope it makes sense to you.....suspected terms of service abuse....whoa!
1) The "normal" Google searcher (except the Google Custom Search) use a depracted API for searching. Obviously, they have lowered the number of allowed queries during a time frame and block, when this limit is exceeded. See also on the API webpage: "Note: The Google Web Search API has been officially deprecated as of November 1, 2010. It will continue to work as per our deprecation policy, but the number of requests you may make per day will be limited. Therefore, we encourage you to move to the new Custom Search API."
2) For DuckDuckGo we use an inofficial "API" for accessing the search API, and while a small amount of queries obviously work fine, heavy use will also block the API.
In both cases, there is not much we can do about it (I modified the DuckDuckGo search to handle high query loads better, but I cannot promise the fix will be on a sustained basis). My recommendation: If you do heavy searching, rely on other web searchers (such as Bing), which provide an official API (and high query amounts per paid plans).
thanks for your detailed feedback and efforts. Luckily the other browsers are working. I have also the possibility to add a /json switch after a search in blekko.com. that gives the possibility to parse it out in R.
I am beginner in KNIME and would like to query the web for the term agriculture (well, I have a list of words) but I need to work with one term first and get the work flow right. My first problem is:
I have been trying to use web searcher (google engine) but it keeps giving me an error message. Actually all search engines give back an error message-see below:
ERROR WebSearcher Execute failed: Exception parsing the JSON response while searching for "KNIME" with Google: JSONObject["responseData"] is not a JSONObject., JSON was: "{"responseData": null, "responseDetails": "Suspected Terms of Service Abuse. Please see http://code.google.com/apis/errors", "responseStatus": 403}"
ERROR WebSearcher Execute failed: Could not instantiate ws.palladian.retrieval.search.socialmedia.FacebookSearcher, exception from constructor: accessToken must not be empty
ERROR WebSearcher Execute failed: Could not instantiate ws.palladian.retrieval.search.web.BingSearcher, exception from constructor: accountKey must not be empty
ERROR WebSearcher Execute failed: Could not instantiate ws.palladian.retrieval.search.socialmedia.TwitterSearcher, exception from constructor: consumerKey must not be empty
ERROR WebSearcher Execute failed: HTTP error while searching for "#agriculture" with DuckDuckGo (request URL: "http://duckduckgo.com/d.js?l=us-en&p=1&s=0&q=%23agriculture"): Exception org.apache.http.conn.ConnectTimeoutException: Connect to rundmc.duckduckgo.com:3433 timed out for URL "http://duckduckgo.com/d.js?l=us-en&p=1&s=0&q=%23agriculture": Connect to rundmc.duckduckgo.com:3433 timed out
How do I solve this?
The second question:
Where can I download the webcrawler workflow given in https://www.knime.org/files/knime_web_knowledge_extraction.pdf
the API keys for some search engines (Facebook, Bing, Twitter in your example) need to be set up in the KNIME preferences (KNIME > Palladian Web Searcher). Have a look at the node's documentation, we provide links to the registration pages there. For problems concerning Google and DuckDuckGo search please refer to this post.
Concerning your second question, I would recommend contacting the authors of the paper directly.