(Downloading xlsx-file) Selenium interaction possible with web-based xlsx ?

aroodt · November 30, 2020, 2:35pm

Good day,
Objective: Navigate with Seleniumnodes to specific webpages which contain links to files to be downloaded or read directly into Knime for further processing.

Context:
So, I easily navigate to the page on an intranet (after having authenticated) - which contains a link to a xlsx that I need to download / process.
I can (with Seleniumnodes) click on the link and the tab is refreshed with a web-based version of the xlsx-file, that can then manually be downloaded.

Where I get stuck:
I’m not able to interact with the web-based xlsx-tab - so I cannot download the file.

What I already tried: (to no avail) - if you might have other suggestions, they are more than welcome:

Click the link (for the xlsx file to be opened in the browser) and navigate/refresh URL opened tab that contains the web-based xlsx. => It appears that as soon as the web-based-xlsx has opened, I’m not able to interact with the new tab - at all, except navigate back to the parent-page.

I’m also not able to extract data from the xlsx-tab to a KNIME table (even when I refresh or update-URL.

Any attempt to find elements on the xlsx-tab, time-out while the outline-box is being populated and I end up with nothing to actually find. I’ll further look into it - once I know what the download-button is called, I might be able to interact with it, even while I’m not able to “find/see” it in the outline-box.

HTTPRetriever => Error 401 Unauthorised
- Use cleaned-up extracted URL - seems to be correct, because if the link is changed
to a non-existing link I get a different error-code.
- Headers: I leave it completely blank, as I have no idea (still researching) what to put here.
I found that I could extract header info via the get-request, but still have not determined
if the get-request header output might be useful for the HTTPRetriever.
Get Request => 401
- Authentication: NTLM (was using none and basic, but realised in the Get-request
output indicated that NTLM is to be used and therefore started with this method (with
credentials) - but not specifying domain).
- Request headers: once again left empty…
- Response headers: left empty
- Get-request indicate it expects to see content-type = text (So perhaps this is not
  correct manner either).
XMLHttpRequest:
=> ERROR: {“isTrusted”:true}
As far as I can determine, it might be a network setting.
Initiating an HTTPS connection and try download - also was not a success story.
Simulate a right-click and find the download option - in the context menu. (Although Selenium cannot interact with win-download dialogues, I had some other plans here => It appears that the context menu now also does not include a save-as option.

If there is something you might think that I have obviously missed - or have some other ideas, I would very much appreciate - as not being able to download from the network - would be a huge problem for me.
Thank you in advance.

qqilihq · November 30, 2020, 3:21pm

Hi aroodt,

without seeing how the page looks and works, it’s really difficult to give any suggestions or advice here. Thus, before I had into a wrong direction: Can you share some more details in that regard? (link, source code, screenshot, …)

If you prefer not to share this publicly, you can reach me via mail@seleniumnodes.com as well.

Best regards,
Philipp

aroodt · November 30, 2020, 3:34pm

Thanxs for the timely response!
I fully understand your reponse.
Unfortunately, I’m not at liberty to disclose
Have a good day.

system · April 21, 2023, 9:38pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.