Cannot access webpage content anymore

yanbeemwe · April 3, 2024, 4:52pm

Hi,

we retrieve contents of a webpage to get a list of ID’s. We use a URL that looks kind of like this:

https://XXXXX-1.net/api/upload-history?projectId=XXX&authKeyXXXXXXXXXXXXXXX

whenever I access this webpage via webbrowser or powershell I can extract the data in XML that I need. It looks like this:

Last week we could also achieve this with KNIME using the Load text based files node by simply giving it the URL. However the admins of the URL seem to have changed something and now we cannot access the data via KNIME whatsoever. We also tried using the Palladian nodes (HTTP Retriever), that worked on an earlier build of our workflow but this does not seem to work either.

There is a parallel webpage running on an earlier server version where the Load text based Files node and all other ways to access it have been working flawlessly.

Further info:

accessing the link with (different) browsers works
accessing the link with powershell works
website cannot be accessed with GET because it returns nothing (–> GET Request problem)
tried other KNIME versions did not work
tried the Palladian nodes (HTTP Retriever), does not work anymore
tried using different Authentication Keys in the URL
tried using Webpage Retriever node (never worked)
tried the XML reader node with the URL (never worked)
tried changing up the KNIME internet settings

I asked the server admins, but they did not know about any changes on their side, that should affect the url.

Does anybody know how I can extract the webpage content again? Is there maybe a way to change the user agent, whith which KNIME accesses URLs or something similar? Maybe that has to be updated.

Thank you for reading!

ArjenEX · April 3, 2024, 5:02pm

Hi @yanbeemwe

I’m curious how your GET Request node was configured because there shouldn’t really be a reason why it would not be reachable through this way.

yanbeemwe · April 4, 2024, 7:05am

Hi @ArjenEX

Thank you for your reply.

The GET request is configuered like so (no Request and Response Headers, also no authentication):

I tried using Header Key: Content-Type = text/XML and I also let KNIME extract all Headers:

however this changes nothing. This is the result:

As you can see, like in the example in my inital post (GET Request problem), I receive a 200 with an empty binary response. Is there maybe a way I can access the XML via the GET by changing up the settings? That would be fantastic.

yanbeemwe · April 4, 2024, 7:39am

/Edit: When I hover over the red question mark in the response I see this:

“Missing value (Response doesnt have a media type)”

FYI

ArjenEX · April 4, 2024, 8:01am

Is there publically available documentation about this API?

The text/XML is not very common. Media-Type and Content-Type are somewhat interchangeable terms in REST so my suspicion is that this value is not correct.

Have you tried application/xml already?

yanbeemwe · April 4, 2024, 8:20am

There is no public documentation unfortunately. However maybe I can ask the developers about infos that would be useful if you have questions about it please ask away.

I just tried the following to no avail:

Header keys: Accept, Content-Type
text/xml
application/xml
application/octet-stream
application/xhtml+xml
application/json
text/html
text/csv

system · July 3, 2024, 8:20am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.