Download file using API and security token?

j_ochoada · April 8, 2022, 2:12pm

Hi Everyone,
I’m using a data repository tool called CDD https://www.collaborativedrug.com/ and it utilizes a web interface but also has some API capabilities. The situation I am trying to achieve is executing a search in CDD using a GET call and a security token. That works fine. The issue is returning the csv file that it produces. If I use the KNIME GET node it returns a blob which does seem to contain the information and I can go from blob to string but then can’t seem to split that blob up in KNIME.

Anyway it would be great if I could download the file using the palladian HTTP Retriever but I’m not sure how to set it up to pass the security header similar to placing it in the KNIME GET Request header. I’m also not sure how to configure it to download the file.

Any help or insight would be greatly appreciated.

Thanks,
Jason

qqilihq · April 8, 2022, 2:24pm

Hi Jason.

you can add a column for the header (I assume it’s Authorization so you’d call the column just like this) and select this in the Headers tab in the HTTP Retriever. This should do the trick.

Let me know how it goes.

–Philipp

j_ochoada · April 8, 2022, 2:53pm

Thanks for the guidance @qqilihq

Obviously I’m new to all of this. The KNIME GET Request that works has three pieces of info that I use. Header Key, Header value, and Value Kind. I created three columns which are named this and contain the information exact as i use it in the KNIME node. I added all three columns to the header tab in the HTTP Retriever. Unfortunately I’m getting back a 401 authentication failure.

Any ideas on what I’m missing?

Thanks,
Jason

qqilihq · April 8, 2022, 2:58pm

Per header, the HTTP Retriever expects one column which is exactly named as the header name (which would e.g. be Authorization) and which contains the value in the cell. In other words, for each header you only need to add one entry in the “Headers” tab.

For further input, feel free to share a reduced workflow (either here or at mail@palladian.ws) and I get give further input.

Best,
Philipp

j_ochoada · April 8, 2022, 3:32pm

@qqilihq Thank you for your patience.

I understand now. The KNIME node Header Key is the header name (and the name of the column to be sent to the palladium node) and the header value (cell value passed) is the token and the type is not utilized.

Great, I got a 200 code with a response. I reran the complete call to start another search because I have been accessing the previous search a few times using the KNIME GET node.

Is there something in the response that I should be looking for? I guess I was thinking there would be a path or something that I could use a file transfer node to download the file.

OK I re-ran the search and I get the following output result:

x-request-ID: some long id key
date: Fri, 08 Apr 2022 15:36:42 GMT
server: nginx
transfer-encoding: chunked
content-security-policy-nonce: looks like some long key
x-frame-options: SAMEORIGIN
x-download-options:noopen
x-permitted-cross-domain-policies: none
strict-transport-security: max-age=63072000; includeSubeDomains
content-security-policy: block-all-mixed-content; default-src ‘none’; frame-ancestors ‘self’ lots more stuff

Funny I had to type all of that because if I try to copy it I get something much more simple:

{“id”:11717026,“created_at”:“2022-04-08T15:29:16.000Z”,“modified_at”:“2022-04-08T15:33:18.000Z”,“status”:“finished”}

nothing that I can identify as a path to a file… Any thoughts?

Thanks,
Jason

qqilihq · April 8, 2022, 4:23pm

Hi Jason,

could it be that there’s a separate endpoint for downloading the file? From what you post, it definitely doesn’t look like there’s any file-related data in there.

Is there some publicly available documentation for that API available?

Alternatively my offer stands, if you wish you can also get in touch with me via mail@palladian.ws.

–Philipp

j_ochoada · April 8, 2022, 5:45pm

Hi Phillip,

Thanks again for the offer of help. There is documentation but unfortunately it is pretty thin IMHO

First a search is created in the GUI then you can execute the search as highlighted here

Then you can check the status of that search and then pick up the download as specified here.

So I executed the search using:

https://app.collaborativedrug.com/api/v1/vaults/“vault_number”/searches/“search_ID”

Then checked using

https://app.collaborativedrug.com/api/v1/vaults/“vault_number”/export_progress/“job_ID”

Then the call which is suppose to give a csv file as a default

https://app.collaborativedrug.com/api/v1/vaults/“vault_number”/exports/“job_ID”

The support team at CDD have almost zero KNIME familiarity but claim that the download works with pipeline pilot and postman. I’m asking for the postman call as well as pipeline pilot info to see if that can help me figure it out what is special about what those tools might be doing different.

I don’t have a workflow at this point just the HTTP Retriever node that I’m thinking if I get the configuration correct I’ll be able to download a file.

I’m going to try to resubmit the search asking for a zip output to see if that changes anything…

Thanks,
Jason

j_ochoada · April 8, 2022, 9:35pm

OK after some back and forth with them it appears that they do not hold a file. It appears that the only option is that it returns the data in the body in csv, xls, sdf, or zip. The issue I have with this is the amount of data I’m looking to gather is 2 columns and 900K rows. If I execute the job in the GUI it completes the search and takes forever to prepare the data for download. If I do that exact same search in the API it runs much faster which is great! Until you find out that it only returned 30K rows of the 900K expected in the search. This is a asynchronous search so it appears they are capping it or there is a bug that I have found.

I’m asking that they confirm the behavior I’m seeing and hopefully I get a resolution. I’d hate to have to do 900 live calls each at 1000 rows each (the imposed limit for direct data access)

Tier one support claims there is no throttle on the API so I’m doing escalating searches to see where it tops out.

Thanks for all the help, it’s so very much appreciated All of this is going to the HUB when I have it sorted for sure.

Jason

j_ochoada · April 8, 2022, 9:37pm

Geeze, I didn’t mean to delete my previous post and It won’t let me undelete it. Here it is again

Ahh,

OK when I ran it again selecting zip=true I do get an HTML response which contains something called content-disposition: attachment; filename="Export 2-22-04-08.zip

The downside is that it’s not clear what the path is to that file hahahaha. Guess I’ll have to try some ideas that are simple like maybe it’s the same path as to where the other calls I am sending or I’ll have to go back to them for the location of the file.

Is there a way I can parse that HTML result to pick out the filename? The HTML parser gave an empty cell. (I found the answer to this there is a palladium HTTP Result Data extractor.

Thanks,
Jason

qqilihq · April 9, 2022, 7:53am

That suggests that your response is already the ZIP file. HTML Parser is not needed here.

I’d proceed as follows: First, write the result to your file system using the following node (select the HTTP Retriever’s result column as binary column):

See if you can open the file. If yes, you can continue building your workflow and add the appropriate node to pick up and parse the file.

Looking at what you wrote above, I would probably try to download as CSV or XLS instead of ZIP. This will save you the manual step of having to unzip the file. (the transfer will usually be compressed anyways, as the HTTP Retriever prefers GZIP transfer encoding).

j_ochoada · May 19, 2022, 1:51pm

Hi @qqilihq

Thanks for this, looking at another thread related to POSTING information that I replied to that you helped on I understand now the value of this node. I almost had it using binary objects to strings and then parsing those strings which does work but I am betting this is much more efficient and I’m going to swap this BO to file node in to simplify the workflow.

Thanks so very much for the support!
Jason

system · May 26, 2022, 1:52pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.