How one can calc Jaccard similarity coefficient between rows that contains “strings” ?
Hi @malik -
Here’s an example workflow that uses a toy dataset. It compares a series of strings by first converting the strings to a collection, then converting that collection to a bitvector, and using the Similarity Search node to calculate (1 - Tanimoto distance), which is effectively the Jaccard similarity coefficient.
In this case you can verify by hand that we have 1 string intersection among 9 values in the union, so the coefficient is 1/9 = 0.111.
JaccardTanimotoExample.knwf (14.1 KB)