Downloading files from URL with header security token

j_ochoada · April 6, 2022, 9:01pm

Hi Everyone,

I’ve been working to try and integrate CDD (Collaborative Drug Discovery) with our other systems. I have been first working to figure out simple GET calls. These GET calls utilize a “token” which is placed in the header tab of the GET request node. This all seems to be working just fine.

The issues for some data retrieval jobs such as trying to dump the whole contents of a database the size of the returned body is too large and it becomes a memory issue.

The suggested workaround is to create a search in the CDD website and then execute that search using the API. I have that all sorted and it works. It returns a job ID and I built a loop to check the status of that job. OK, so now I’m at the point where the job is finished and there is a CSV file waiting for me at a location. I tried the simple thing and just used the GET Request node to retrieve that job/file. It did execute successfully and I get contents in the body which are I’m not sure but the available renderers are hex dump (long and short) and blob info. So I’m stuck trying to figure out what to understand how to process this Binary Large Object, and if there is an alternative way to directly download the csv file?

I did some searching around and found a workflow which uses the HTTP Retriever etc. from @mwiegand at this link:

But I’m unsure how I can pass my security token similar to I do in the GET Request node.

Here is the brief documentation provided by CDD, unfortunately they have limited knowledge around using KNIME to access their API.

Thanks in advance for reading and for your help,
Jason

j_ochoada · April 8, 2022, 2:03pm

OK as an update I can utilize the binary object to string and get something more human readable but it is all in one “row” and “column” I tried using the cell splitter with no luck. I’m going to leave this here in case someone else has ideas and I’m going to post over in palladium to see if there is a solution using that set of tools.

Thanks,
Jason

j_ochoada · April 8, 2022, 2:31pm

OK as a final update… I have figured out how to sort the blob after converting it to string. Before I had tried splitting the cells by \n (newline) but I forgot to check the “use escape character” checkbox. After doing that I was left with many columns which contained the two value pairs MoleculeName, SJNB. I was then able to transpose the columns to rows and then use a second cell splitter splitting by comma to get the two columns and many rows that I wanted. I then split the first row transposed it and inserted it as the header.

So this is solved but I’d really like to know if it’s possible to directly download the csv file which the software created, so I’ve posted help on the Palladian section to see if there is any hope for that.

Thanks,
Jason

LukasS · April 11, 2022, 7:21am

Hi @j_ochoada,

thanks for keeping us posted! Indeed, there is a simpler method that might work for you: you can specify a URL in the CSV Reader

So no need to wrestle with BLOBs and the like. It can even unpack compressed files, if that should be relevant to you.

I hope that works for you!
Kind regards,
Lukas

j_ochoada · April 12, 2022, 11:03pm

Hi Lukas,

Thanks so much for your suggestion. Unfortunately there are two problems. It took some time to confirm with the Tech group responsible for the API but the API call does not result in a formal CSV file being formed that is available for download… Bummer… The second issue is that there is a security token that must be passed in order to use the API. I don’t think the CSV Reader allows for that. I guess what you are saying is if my GET call provided a unsecured link to where the CSV resided then it could be downloaded through the CSV reader or other file reader.

Much appreciated!

I’m going to post again this time to see if the community can parse this data better than I can. I have one particularly slow node (transpose) and I’m betting there is a more efficient way that I’m not aware of.

Thanks again everyone!!

Jason

LukasS · April 13, 2022, 7:05am

Hi Jason (@j_ochoada ),

hmm, you pass your security token via a header in the GET Request node, right? What the node does, is parse the given headers in a URL format. A Header with key parameter and value 123 will be parsed to a URL similar to something like www.URL_to_CSV.com/path/to/csv/file.csv?parameter=123. You could try and craft the URL yourself, e.g. with a String Manipulation node, to attach the header after the Question Mark (?) in the URL (if you’d need to pass strings with spaces and special characters you might want to make use of the urlEncode() function for the values).

With a non-formal CSV you mean that the formatting does not conform to the CSV “standard”? Would the File Reader or the File Reader (Complex Format) work?

Regarding a slow transpose node: there is the chunk size option which can greatly speed up things for large tables. But I see that @elsamuel already posted a nice workflow without the need for the transpose

Cheers,
Lukas

mwiegand · April 22, 2022, 9:46pm

Hi @j_ochoada,

have you managed to resolve your challenge about auth via GET? If not, which is difficult to find, you might try: Basic access authentication - Wikipedia

I once, literally years ago, did that with a stage system behind basic htaccess authentication. Though, the new GET Request – KNIME Hub node offers various auth methods even making htaccess authentication easy to use:

Cheers
Mike

j_ochoada · May 11, 2022, 9:18pm

Sorry for the delay @LukasS and @mwiegand

I guess I didn’t clarify. What I thought was happening was that my GET call was executing an async job to get the data and build a CSV file which is placed on a server somewhere that I could retrieve. After talking to the support group this is not the case. Yes it collects the data in an async job but no file is built or stored to download. The only way to get the data is to do another GET call and the results are returned in a binary large format.

Hope that helps! I appreciate all your help and input.
Thanks,
Jason

mwiegand · June 24, 2022, 5:28pm

That’s quite abstract and challenging to comprehend without any concrete information like the call you are doing and the result received.

system · September 22, 2022, 5:28pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.