Mass-Download from given row (wget-like)

Hi anyone,

I’m new to KNIME and I’m trying to solve some mass data issues with it. Looks very cool but I got stuck with a download issue. Perhaps someone can help me by explaning the way KNIME works in this topic.
I’ve a table where one column is an URL and a second one is the desired filename. Duplicate and empty rows have already been filtered. What I’m trying to do is to build a workflow with a function like
“wget -O $filename$ $url-string$”. I do not want to use the legacy downloader but I cannot find out how to handle this with the “Transfer Files” node. Are there any good / practical examples for this issue as the official help has none - sadly.
Thank you!

What have you tried? Do you have any specifics about your application that you can share?

I think I did something similar a about a year ago. I created a workflow which used a String Manipulation node to combine URLs and file names into about 1500 unique URLs that linked to individual text files hosted online. I “downloaded” them all using a loop containing a Line Reader node and a CSV Writer node. There are some other moving parts, but that’s the gist of it. If you share what you’re trying to do specifically, I can determine if this approach is applicable to your use case.

I’ve also accessed files using a String Manipulation node to construct the request and then the GET Request node (or similar) to send it. There are tons of examples of this on the KNIME Hub.

Thank you @elsamuel for your reply. I think I forgot to mention that these files are no CSVs, they are binary files, usually images. So the CSV Writer won’t help here and there seem no general “Write to file” node beeing out there. There is no (Restful) API behind, just a http/https direct link.

Got it! The KNIME Hub indeed had the solution but it requires the Palladian extension:

Retriever + Extractor + Binary to file

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.