Distance node Improvement

With the distance Matrix node, is it possible to have an additional node which is simply a Distance Calculator.

In this you would have two in-ports (i.e. fingerprints from Molecules or whatever else you want to measure the distance of), and it will measure the distance between each row (i.e. molecule) in Port One and all the rows (i.e. all molecules) in Port Two, and report back either 1. The Average Distance, or 2. The Nearest Distance, or 3. The Furthest Distance for each of the rows in Port 1.

This will be useful to look at correlations between data. This is rather difficult with the Distance Matrix Calculator, as the output is not in a useable form for this.

Thanks, Simon.

1 Like

Dear Simon,

you might be interetsed in the Tanimote similarity node from the Tripos Chemistry Extensions (this node is free to use). It calculates the Tanimoto similarity of each source fingerprint (input port 1) compared to a set of reference fingerprints (input port 2). This node can return the Tanimoto to the most similar reference (the maximum Tanimoto similarity) and/or an array of Tanimoto similarities to each of the reference fingerprints.

Otherwise you can create such a node using the Java Snippet node. Search this forum for a thread about Tanimoto similarity and you will find some example Java code. To "mimic" the two input ports you will need to work with workflow variables that contain the reference fingerprint(s).

Fabian

+1 for Simon's request; this would be really useful functionality.

1 Like

Hi all,

combining a control loop AND tripos similarity calculation node does it quite well. It took some trials and errros and a few clicks; something has to be worked around though: the unity fingerprint calculated with the tripos node cannot be passed through the variable node...need to pass the smile as variable, convert with a babel node to structure, calculate the unity FP and compute the Tanimoto distance with the tripos node...doing this the matrix is symetrical, i.e. all distances from every cpd to every other one is computed.

Thanks to Simon for asking and thanks to Fabian for suggesting the tripos option.