# Non-binary Tanimoto Coefficient

Hi All,

I've been working on similarity searching with non-binary data (double integers value). I have been using the Similarity Search node and it works well for Euclidean, Manhattan and Cosine as it able to process numeric values. I have also tried to find node which can process non-binary data but failed.

Is there any similarity searching node for Tanimoto coefficient that can process non-binary data? Any help is appreciated. Thank you.

You should find this using embedded R. Don't forget to download and install the required package in your R-Knime installation.

http://stackoverflow.com/questions/5597305/how-to-use-r-to-compute-tanimoto-jacquard-score-as-distance-matrix

Or Use the Byte Vector Distance node

Steve

fabienc,

I've never used R before, and even Knime is new to me. I'm currently using Similarity Search node which accept 2 inputs (one for my query table and one contain the reference table). Can you advice me how do I have the similar input in R, having Tanimoto as the coefficient?

LM

Steve,

From my observation, Byte Vector Distance node does not have a function for Tanimoto Coefficient, right?

Well, if I understood well what you meant you want to apply a tanimoto distance to vectors with non binary values long, double, integer. Usually it is used for string values http://www.planetcalc.com/1664/ or bynary as in R with vegan, ade4 or the fingerprint package. Within knime the Tanimoto distance is used with bit vectors. I'm not a specialist in using bit vectors so I will answer with a question. Is it a non sense to create with the Create collection column vector a collection with all the datas required in the two tables, convert this column in bit vector and use the similarity search on the bitvector column ?

fabienc,

Thank you for your reply. I'm actually looking for Tanimoto Coefficient (TC) for calculating similarities. Calculating similarity using non binary TC involved different formula as compared to binary TC as such:

http://showme.physics.drexel.edu/usefulchem/Software/Drexel/Cheminformatics/Java/cdk/src/org/openscience/cdk/similarity/Tanimoto.java

I appreciate your suggestion, but converting the real value data into bit vector, and use the converted data for the similarity searching does not meet the purpose, right?

On another option, I find that the Java Snippet node would be useful as I can code the formula in it. However, unlike the Similarity Search node, which have 2 input ports, Java Snippet only have single input port. I need 2 input ports in my implementation (1 port - target data, 1 port for the reference data). Do you know or have any idea on how do I go about it? Any suggestion would be appreciated.

Thanks.

LM

Hi,

If you want to do that, that program your own function, why do you want to use the java snippet ? You'got math formulas for example. I think it can be done with usual Knimes nodes and encapsulate them in a metanode with two entries.

LM,

if you don't have a problem with java, there's always the option to write your own "proper" extension. There's an extension point just for distance functions, and if you already have a class with the algorithm, glueing it all together should be a snap. And afterwards you have something to share or sell, too.

Not that I haven't missed snippets with more ports as well... or options to implement extension point on the fly and similar things. But you can't have everything...

Hi fabienc & Marlin,

Thank you guys for both of your replies. I really appreciate that.

I've managed to code the function myself by combination of 'Table to R', 'Add Table to R' and 'R Table' nodes.

Thanks guys! Should have replied this earlier!

LM