Download feed file (gzip)

Hi,
sorry if I ask a very common question, however I am stuck. I need to download a file (gzip, csv) from an https URL. I’ve tried many nodes (download, HTTPs Connector, Transfer Files, CSV Reader, HTTS Connection) but I am not able to download the file.

Maybe because of its structure:
https://productdata.awin.com/datafeed/download/apikey/APIKeyxxxxxxxxxxxx/language/en/fid/IDxxxxxxxxxxxx/columns/aw_deep_link,product_name,aw_product_id,merchant_product_id,merchant_image_url,description,merchant_category,search_price,merchant_name,merchant_id,category_name,category_id,aw_image_url,currency,store_price,delivery_cost,merchant_deep_link,language,last_updated,display_price,data_feed_id,brand_name,brand_id,colour,product_short_description,specifications,condition,product_model,model_number,dimensions,keywords,promotional_text,product_type,commission_group,merchant_product_category_path,merchant_product_second_category,merchant_product_third_category,rrp_price,saving,savings_percent,base_price,base_price_amount,base_price_text,product_price_old,delivery_restrictions,delivery_weight,warranty,terms_of_contract,delivery_time,in_stock,stock_quantity,valid_from,valid_to,is_for_sale,web_offer,pre_order,stock_status,size_stock_status,size_stock_amount,merchant_thumb_url,large_image,alternate_image,aw_thumb_url,alternate_image_two,alternate_image_three,alternate_image_four,reviews,average_rating,rating,number_available,custom_1,custom_2,custom_3,custom_4,custom_5,custom_6,custom_7,custom_8,custom_9,ean,isbn,upc,mpn,parent_product_id,product_GTIN,basket_link,Fashion%3Asuitable_for,Fashion%3Acategory,Fashion%3Asize,Fashion%3Amaterial,Fashion%3Apattern,Fashion%3Aswatch/format/csv/delimiter/%2C/compression/gzip/

Any hint on that? Thanks!

Hello Awiener,

I am so happy to help you. Here I just developed a example showing how to take a csv from a https connection:

  1. In HTTPS Connection node > Host “people.sc.fsu.edu” as an input
  2. In Download node > Source file or folder “https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv” as an input

If you have any further question do not hesitate to contact us again.

1 Like

Hi Jose,

thank you. I’ve tried, however I get a 1 kb file that has no file extension.
I think as I am not downloading a .gzip/.csv via the link, something is not working properly.

Get Reqeust Node tells me this:

I just don’t know what to do with it.

Thanks

Hello again!

Could you please explain to us better what you are trying to do with the Get Request Node.

Feel free to share your workflow without any sensitive information. So I can further analyze and help you in a better way.

Best regards.

HI Jose,

sorry for my late reply.

I am just trying to download the CSV. I thought with a Get Request I could see more, what the Server is doing. But this is obviously not the case.

So basically I still want to dowload the gzip compressed CSV file from the inital file path. However when I use Decompress Files Node (the one node that gives me a real result), the downloaded file is a file called “1”, with no file type.

grafik

CSV reader node is not able to read it. I would have to change the file type manually to .csv.

Would there be any other solution? Thanks in advance!

Hi @Awiener ,

Well, based on your screenshot, you are NOT downloading the gzip compressed CSV file. You are trying to decompress the file on the fly from the server (most probably Knime does eventually download the file to a temp folder first).

To download the compressed file, you would use the Transfer Files node:

And then decompress the downloaded file using the Decompress Files node as you did, except that you would choose the locally downloaded file. In your screenshot, that is not the case, you are pointing directly to the online compressed file. In theory, this should also work, but it looks like it’s not working in this case.

Can you try to first download the zipped file with the Transfer Files node, and then decompress it via Decompress Files? If you still get the same result, can you decompress the file via your OS and see if you get the same results - this is just in case that’s what’s in the zip file.

2 Likes

thanks.
When I use Trasfer Files Node with “/” at the end, I get this result:

This is why I’ve tried to use Get Request Node in order to see, which other files I could choose.

However, when I use the file without the “/” at the end, I get this result again:
grafik

This file cannot be read with Decompress Node, because then the download folder is empty aka the file cannot be read.
If I change it manually to “1.gzip” I can decompress it and I can see one file called “1”. If I rename this file to “1.csv” I can now see my CSV.

Again: whenever I put the same path (from Transfer Node) into my browser, I can download a file called “datafeed_xxxxx.csv.gz”. This can easily be uncompressed and the final result is a file called “datafeed_xxxxx.csv”. This behaviour I am trying to recreate with KNIME.

Hi @Awiener , if you have a “/” at the end of the URL, it means you are not pointing to a file. You are pointing to a folder path. You need to point to a file, and you can see the message that confirms this.

If you expect a gzip file, then you should point to path_to_the_file/your_file.gz for example.

Similarly, if it’s a csv file, your url should be point to path_to_the_file/your_file.csv

EDIT:

That is not correct. Putting this path in your browser, does not download the file. I opens the online folder and shows you the file, and then you have to download the file by clicking on it for example.

The Transfer Files does not have the interaction of showing you the file for you to click download. You have to give the full path of the file. And again, this is clearly mentioned in the message from your screenshot (I wrote these comments without looking at the messages, and only noticed them after I wrote them).

So, what you need to do is add “datafeed_xxxxx.csv.gz” after the “/”. For example: path_to_the_file/datafeed_xxxxx.csv.gz

2 Likes

Thank you so much! :+1: Finally got it. Puuuh!

Last and minor question: is it possible to connect the nodes somehow to each other? So that in batch mode decompress is done AFTER the file has been downloaded, and the CSV is read AFTER the gzip has been decompressed? At the moment the nodes “fly” around in space :wink:

grafik

You can use “flow variables” connections for that. Drag with your mouse from the very top right corner of the preceding to the successor node. This way they are executed in sequence once the previous one has finished.

Alternatively, right click the node and select “Show flow variable ports” to explicitly show the ports.

–P

3 Likes

Hi @Awiener , yes you can link nodes via the Flow Variable ports. This is applicable to any nodes actually, so you can also link the CSV Reader after the Decompress Files, like this:
image

2 Likes

Thank you to both of you! :+1:

Have a great evening!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.