Hey, checking if anyone has had an issue with downloading Excel (xlsx) files. The site I usually download from just added authentication, which means I can’t use the Excel Reader node anymore. Now I’m trying to figure out how to download the file using my credentials. My latest attempt is using the HTTP(S) Connector node to pass in my credentials and the site URL, then using the Transfer Files node to move the file from the site to a local directory.
This seems to work fine, however, the file gets corrupted in transit. I can see that the file size is much smaller than it should be (below is a comparison to what a good file should look like).
Below is what I see when I try to open the downloaded file.
Hi @elohbeck , I don’t have a proper sample to test this unfortunately… This is weird though, since the Transfer Files node seems to have worked properly.
have you had any success with it? If not can you tell us what kind of a site is it? If you open downloaded file of 41 KB do you see part of your data or empty file?
Still the same result. The site was recently changed to HTTPS and now requires a login step (email only) to download the file. I can’t seem to open the file with Excel, however, when I open with TextPad I can see the content is actually HTML rather than the real data.
Hi @elohbeck , this means that you were not able to download the Excel file, but rather got a response from the website in HTML format. You can see what the response is by renaming the file as an HTML file (MY_FILE_ACTUAL.html) and open the file from a browser. This should tell you what the issue was.
Hi @bruno29a , I converted it to HTML and opened the file from a browser. It took me to the login page. It seems like the HTTP(S) Connector node didn’t pass the authentication settings properly. Is there another way to go about this?
Hi @elohbeck , I am not sure how the authentication works with the http connector node, and I don’t have a set up to test this. In addition, based on your screenshot, it looks like the login of that page is done in 2 steps - password field does not seem to be available at the first step.
For the http connector node to pass username and password properly to the site, it has to be mapping them (username and password) to the correct username variable and password variable of the site’s html form, so I am not sure how this works.
I think the Selenium nodes might work better in this case.
I have never used the Selenium nodes myself, but I have used Selenium before, and based on the Selenium nodes description, it is doing what Selenium does which is essentially automating a web browser.
Please have a look at it, you may be able to automate the logging in and download of the file.
beside the Selenium suggestion from @bruno29a - you can also most likely emutate the login with a Post Request e.g. check what is send by your browser (network monitor in firefox/chrome) and then just send the same
However this depends on how the page is setup.