Good day,
Objective: Navigate with Seleniumnodes to specific webpages which contain links to files to be downloaded or read directly into Knime for further processing.
Context:
So, I easily navigate to the page on an intranet (after having authenticated) - which contains a link to a xlsx that I need to download / process.
I can (with Seleniumnodes) click on the link and the tab is refreshed with a web-based version of the xlsx-file, that can then manually be downloaded.
Where I get stuck:
I’m not able to interact with the web-based xlsx-tab - so I cannot download the file.
What I already tried: (to no avail) - if you might have other suggestions, they are more than welcome:
- Click the link (for the xlsx file to be opened in the browser) and navigate/refresh URL opened tab that contains the web-based xlsx. => It appears that as soon as the web-based-xlsx has opened, I’m not able to interact with the new tab - at all, except navigate back to the parent-page.
I’m also not able to extract data from the xlsx-tab to a KNIME table (even when I refresh or update-URL.
Any attempt to find elements on the xlsx-tab, time-out while the outline-box is being populated and I end up with nothing to actually find. I’ll further look into it - once I know what the download-button is called, I might be able to interact with it, even while I’m not able to “find/see” it in the outline-box.
-
HTTPRetriever => Error 401 Unauthorised
- Use cleaned-up extracted URL - seems to be correct, because if the link is changed
to a non-existing link I get a different error-code.
- Headers: I leave it completely blank, as I have no idea (still researching) what to put here.
I found that I could extract header info via the get-request, but still have not determined
if the get-request header output might be useful for the HTTPRetriever. -
Get Request => 401
- Authentication: NTLM (was using none and basic, but realised in the Get-request
output indicated that NTLM is to be used and therefore started with this method (with
credentials) - but not specifying domain).- Request headers: once again left empty…
- Response headers: left empty
- Get-request indicate it expects to see content-type = text (So perhaps this is not
correct manner either).
-
XMLHttpRequest:
=> ERROR: {“isTrusted”:true}
As far as I can determine, it might be a network setting. -
Initiating an HTTPS connection and try download - also was not a success story.
-
Simulate a right-click and find the download option - in the context menu. (Although Selenium cannot interact with win-download dialogues, I had some other plans here => It appears that the context menu now also does not include a save-as option.
If there is something you might think that I have obviously missed - or have some other ideas, I would very much appreciate - as not being able to download from the network - would be a huge problem for me.
Thank you in advance.