I have one URL for which the HTTP Retriever does not download the correct content. When accessing the URL in a browser the result is different than the HTTP Retriever result. The URL is rss feed:
In the browser the rss content is shown. When using the HTTP Retriever node followed by the Feed Parser rss content can not be extracted. Attached is a workflow that shows the problem. Any ideas?
I checked your workflow, however the issue does not seem to be the HTTP Retriever, but the Feed Parser, which parses the feed's meta information, but not its items. The problem seems to be specific to the KNIME nodes, as the same feed can be parsed correctly when I'm running directly from code using the Palladian lib.
I will investigate this further when I have some spare time and get back to you.
[edit] Are there any further feeds, where you encountered the problem?
However, I assume the problem is the HTTP Retriever, at least for the long URL. When I am trying the long URL the result of the HTTP Retriever is not a valid rss feed result. It is xml but contains no news items. I believe that for some URLs (mabe very long URLs?) the HTTP Retriever has problems.
thanks for spotting that issue, this was indeed a regression introduced recently in the HTTP-specific Palladian code, which did not parse URLs with query parameters having the same names correctly. That problem is now fixed in that latest build.