String Manipulation md5Checksum incorrect

mwiegand · July 26, 2023, 7:14pm

Hi,

while working on some image deduplication tasks I happen to notice that the md5Checksum function of the String manipulation node seems to NOT factor in the entire Binary Object but actually caps it.

This is also noticable in the processing time. In contrast, the Hash Calculator from Palladian takes considerably more time and correctly calculates the MD5 hashes.

Edit: Here is the test workflow

Best
Mike

ScottF · August 2, 2023, 3:32pm

Hi @mwiegand!

Thanks for this - definitely looks like a bug. Thanks especially for the test workflow. I will create a ticket in our system for this.

(EDIT: AP-20797)

mwiegand · August 2, 2023, 3:53pm

You are most welcome @ScottF. I’ve created a few workflows to demonstrate issues, bugs or suggest improvements in the past.

Since then an idea is always in my mind, which I am kind of following by certaing these example workflows in the most generic way, to build kind of a litmus / unit test suite to leverage Knimes best feature to realibly test it’s functionality.

The workflows sometimes also serve as example to display various approaches and benchmark them against each other under different parameters like small or large tables.

Is there something similar your folks at the head quarter leverage already or would what I explained above be a new appraoch? I also do recall that I mentioned something similar before and you replied that you got some ideas from that. Do you recall?

Cheers
Mike

ScottF · August 2, 2023, 3:58pm

We do have an extensive testflow suite that we run for every build we produce, so it sounds like we’re thinking along the same lines.

As a community member, one of the most helpful things you can do is what you’ve just done above, which is to 1) find a bug and 2) create a workflow that reliably reproduces the problem. This helps expedite the bugfix process tremendously.