phash for image comparison

Hello everyone,

inspired by Find original source image for s thumbnail I would like to try the phash to compare images.

I could do the simpler ahash, which is based in the mean of an Image, but when it comes to phash and the DCT I have real trouble to do this in KNIME. I am also not shure how to do DCT in KNIME…
Here is a description of both methods:
I was wondering if anyone could help me with this.

Many thanks

Hi @mervebx3,

There’s the Image Difference Checker node that supports pHash for image comparison. If you are loading images with the Image Reader from the Image Processing extension, you’ll have to convert images to PNGs with the ImgPlus to PNG Images node and subsequently use the Table To Image node to get a single image port from a column of images.

Hope that helps!



Hey @stelfrich,

thank you for the fast reply :smile:,

I tried what you mentioned but unfortunately when it comes to the “Table to Image”-Node I can’t open the Dialog.
Here is a screenshot of what i did - is somethng wrong here ?

I forgot to mention that the ImgPlus to PNG Images node generates a collection of PNG images: Depending on the structure of the input image and the configuration of the node, you’ll get a single PNG image for one timepoint of a video, for instance. Using an Ungroup node will convert the collection into a proper ImageValue column as expected by the Table to Image node.

1 Like

@stelfrich awesome, it works! Thanks for your quick reply, I really appreciate it.

Now I have another question regarding this, as I see this node compares only exactly two images with each other and it shows the degree of similarity only if the threshold is not reached. Is there any way to extract the similarity value anyway? or iterating this procedure over a whole dataset of images to compare them?

I ask this because my overall goal is to group similar images into a cluster using a clustering approach and as a measure for the clustering I wanted the measured similarities of this method with the phash (or dhash) - so accordingly I would need a list with the percentages of how similar each image is to each other or for example to a template of good or bad images. Depending on how it is possible to realize this.

Thanks in advance


I really wanted to tell you that you can just use the Hierarchical Clustering (DistMatrix) node and feed an appropriate Distance Measure for images to it. Unfortunately, that’s not the case, as far as I can tell. In addition, the pHash implementation from the Image Difference Checker node isn’t readily available for reuse in a Java Distance Measure.

One workaround could be to use a Cross Joiner to get all pairs of images and apple the Image Difference Checker node inside of a Chunk Loop that is configured to process one row at a time. Subsequently, you should have all distance pairs for the images.

Best regards,

Hello @stelfrich,

too bad haha
About the workaround - could you elaborate on this a bit more. I didn’t quite understand in what context I could use the chunk loop, since no similarity values are logged anyway. Here is a picture that shows what I tried to implement, but the loop breaks when the similarity does not meet the threshold. So I have to run the node manually each time.

Also I read in the node description that the cross joiner is an expensive operation. However, I have to process more than 200.000 images, so I’m looking to get as cheap as possible with the computing time. (For the image, I used only 30 images as a test.)

Best regards,

Turns out that I hadn’t thought this through entirely: Sorry! Since the Image Difference Checker doesn’t even return the pHash value as a flow variable, there’s no way to collect this in a loop.

I briefly explored, if there’s a DCT implementation in ImgLib2 or ImageJ Ops that we could leverage, to no avail. Maybe @imagejan has a last idea?