Hamming distance node?

Does anyone know of a Hamming distance node that is available? I want to use it with the other string distance algorithms.

Might there be plans for one in the future?


Hi @BenJones and welcome to the forum.

We don’t have a node that incorporates Hamming distance that I’m aware of, but there is the Similarity Search which includes the Levenshtein distance, which is close.

Does that work for your purposes? If not, you’d probably need to go the R/Python route as a workaround.

1 Like

Hi Scott, thanks!

I’m currently using the similarity search with the Levenshtein, Tversky and Jaro-Winkler. Just wanted a comparison with the Hamming to be honest!

Thanks for the prompt reply, will definitely take a look at the Python route. Appreciate the help!

Hi @BenJones & welcome to the KNIME forum community

As mentioned by @ScottF and as far as I’m aware myself too, there aren’t specific nodes to calculate a Hamming distance between vectors of words.

Having said this, the Hamming distance is just an intersection between set bit vectors and can be easily calculated from the Tanimoto distance, as follows:

Tanimoto distance: T( A, B) = |A and B| / (|A| + |B| - |A and B|)

Hamming distance : |A and B| = T( A, B) * (|A| + |B|) / ( 1 + T( A, B)) 

The following workflow shows how to implement it:

The example uses bit vectors calculated from molecule smiles which it is not straightaway compatible with strings but shows already the way on how to use this Hamming distance implementation.

If your strings are already coded into bit vectors then adapting this workflow to your needs should not be complicated. Otherwise, please share with us a minimalist workflow with what you want to compare using the Hamming distance and we will be happy to help you from there.

Hope it helps.



This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.