Download image URLs and save them as both jepeg/png file and also as a Knime Node

Hi,

I am new to knime. I have a JSON file that contains URLs. I turned JSON into a table and ungrouped it to have all URLs in one column. Next, I used the Get node to download the images, but the get results appeared as a new column with the image pixle values. Is there any way to store images as both image files (JPEG or PNG) on my laptop and also store them as nodes to later feed them to a machine learning algorithm? I need to have separate folders for the URLs based on the unique lat and lng values. Meaning that for each unique lat and lng values, a folder needs to be created and downloaded with all urls and save the related images in that folder. Also, save it as a node to be later use inside the workflow.
Below, is the sample of data and workflow.
Mapillary Image Collection.knwf.knwf.knwf (13.2 KB)
image_id_list_without_property_Drinking Places (Alcoholic Beverages).json (1.7 MB)

Hi @NeginZarbakhsh , I’ll do a quick answer on the first part regarding images, since I have some experience dealing with images from GetRequest.

It seems that your URL isn’t complete, since you need to add the image format at the end. In your case, you should add “.jpeg” without the quotation marks.

Instead of using the GetRequest Node, you should use the ImageReader (Table) Node instead. You’ll end up with a column of the JPEG images.

Since you mentioned you also want PNG, you can then use the Renderer to Image Node right after the ImageReader (Table) Node.

From your sample data, here’s what you’ll get:

Where the left column is JPG and the right is PNG.

Lastly, to store them in your desktop, you can try the Image Writer Node.

I can see that @ArjenEX is typing, so you’re in good hands. I don’t attach my workflow here cause I’m simply relaying the concepts (and I just wanna do a quick writeup and go). I’m sure your solution will lie on what @ArjenEX will share after this post of mine :smile:

3 Likes

Hi @NeginZarbakhsh

You have quite a few things at hand there. I’ll try to go through them as much as possible.

Is there any way to store images as both image files (JPEG or PNG) on my laptop

The route that @badger101 already highlighted is a very nice one to take :slight_smile: The only issue that I have with it is that the Renderer to Image node is asking for a pixel size that I do not know in advance and is different for every image. This can be flow variable controlled but requires metadata extraction which takes quite a bit of additional effort. I’m happy to learn how you normally tackle this @badger101

For the sake of this illustration, I’ll create two data flows; one that covers the PNG’s and one for the JPEG’s.

To create the PNG’s, the Binary Objects to PNG’s node is sufficient to convert the output of the GET request to the correct format. Followed by the beforementioned Image Writer (Table) node where the PNG column is the one that is used to write the images.
image

For the JPEG’s, I’m including the solution that @badger101 outlined with using an equivalent node from the same KNIME package, the Image Writer. Using JPEG as file format.

I need to have separate folders for the URLs based on the unique lat and lng values.
You can definitely create this dynamically based on the values in your lat long column. You can approach this in several ways depending on how dynamic you want this to be. This is a way:

Since you indicated that this should be done per lat/long, I’m using a Group Loop to handle all files with the same lat long simultaneously that should go to corresponding folder. I start with a Constant Value Column to define the base output directory.

I then use a Column Expression to dynamically determine the folder location.

This location is then passed along as variable to both Image Writer nodes that will eventually create the images. For the PNG writer, the location needs to be of type Path, for the JPEG writer this needs to be a string. As such, the former needs a String to Path node in between.
For the PNG wriuter, you can set the variable by clicking on the V next to the Folder directory and select the earlier created variable. KNIME highlights whenever the variable is actively in used at the bottom of the node.

For the JPEG writer, you need to navigate to the Flow Variable tab and apply it to directory_key

Run the entire workflow and you end up with a folder according to the coordinates with both PNG’s and JPEG’s in there.

This should get you at least all the images for later usage.

WF:
Mapillary Image Collection JPG and PNG Writer.knwf (46.7 KB)

Hope this helps!

4 Likes

Hi @badger101 , Thank you so much for the hints! I like the workflow idea you suggested!

@NeginZarbakhsh Thanks. I would have studied the workflow by @ArjenEX if I were you though. Mine wasn’t complete and I don’t intend to pursue further with it since I’m busy with some things. I also have to note that your current workflow can proceed with what @ArjenEX had done, since theirs don’t require a URL modification (for the PNG) and it maintains your GetRequest node usage.

Update: I also noticed that you said you want to save the images in KNIME node to use for ML. In the workflow by @ArjenEX , that had also been addressed, it’s just that they didn’t mention it. The images are stored “as nodes” (to quote you), in the Binary Objects to PNGs Node (for PNG) and in the Image Reader (Table) Node (for JPEG). All you have to do is to drag the output ports from either of these two nodes to your ML workflow.

ML = machine learning

I believe @ArjenEX will assist you in case if you have follow up questions.

1 Like

That’s very clear, and the workflow was exactly what I needed. Thank you very much!

It seems that the Binary Objects to PNGs and Image Writer (Table) nodes require high memory to convert the URLs and store them as PNG images. I am running my workflow on a server with the below memory details, and I set Knime.ini to -Xmx64g (half of the total memory).

          total        used        free      shared  buff/cache   available

Mem: 128549 91427 1061 79 36060 36005

For 21,599 records (21,559 images of size 2048), after adding the partition node to repeat the workflow you suggested for 8 partitions with each partition row size of 2,694, the knime crashes (exit from the software) and I face the below error. Do you have any suggestions on how I could handle this for big JSON files? If I reduce the -Xmx64g, the knime will not crash, but I will face a Java heap space error. Should I still increase the number of partion nodes?

.

@badger101 I like how you keep referring to me as they/theirs, addressing my alter ego I guess :rofl:

@NeginZarbakhsh To be honest I haven’t processed so many file at once before. If your use case allows it I would stick to JPEG’s. I believe the amount of nodes that you can use with it is much higher. I’m not really sure what your intention is with the Partition. I just kept it in there for the sake of just processing a few rows, which I believe you also used it for.

To manage the processing of the images a bit more, a possible solution could be to add a Chunk Loop. This allows you to define the batch size of rows that you want to process at once. If you then run a Heavy Garbage Collector afterwards, that should help you to manage the memory issue better

It does require a small change in the Image Writer node to avoid that the filename keep overriding per each Chunk. For this I started to use the image-id column which was already included in the dataset (converted it from long to string with a Column Expression node).

I only applied it to the PNG flow for illustration purposes. I’d say play around with it a bit and see what kind of configuration and settings in general keep the flow running.

See WF v2:
Mapillary Image Collection JPG and PNG Writer V2.knwf (52.5 KB)

Hope this helps.

1 Like

Great! This is really helpful, I will try it.

Thanks!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.