document vector > bit vector distance

Hi All,

apologies for maybe my newbie question, but I’m struggling to get a DocumentVector - bitvector collection into a bit vector distance measure, they don’t seem to be very compatible…

i would like to get “similar documents” via the tanimoto distance

image
image

i notice that the output column is a “List (Collection of: Number (double))”, containing 0.0’s and 1.0’s; i doubt that is the incorrect input values for a bitvector distance node?

kind regards,

Herman

Hi Herman,

a bit vector is a special data type in KNIME Analytics Platform, which you need to use the Tanimoto distance in the Similarity Search node.

To create a bit vector you can use the Create Bit Vector node. To use this node it is important tu uncheck the checkbox “As collection cell” in the configuration dialog of the Document Vector node. After you created the bit vectors you can select the Tanimoto distance in the configuration dialog of the Similarity Search node.

Cheers,
Kathrin

Hi Kathrin,

thank you very much for the advice, that works as a charm!

I’m just confused then, what is the purpose of the “bit vector” | “collection cell” configuration in the DocumentVector node? which nodes can use it as input?
–or wouldn’t it be more usefull to have those options mimic the CreateBitVector node behaviour

Cheers!

Herman