I want to calculate Tanimoto similarity for all possible pairs of molecules in my dataset. I have used loop to input reference structure and able to get all pairs. For 5 molecules i get 25 pairs and their similarity, however there are redundant pairs forming using this workflow. I just want 10 uniquie pairs. How to perform this task or further fitering in knime?
I uploaded a workflow into the KNIME public server some time ago called "Complex SAR" under "Applications" which has got an example pairwise similarity. This may help.
Thanks for the answer, but i could not locate example of pairwaise similarity in the workflow. However, there are "matched pairs" similarity examples in it. Can you please tell am i missing something.
I want all possible unique pairs and their similarity.
If you look at the Similarity Matched Pairs section, at the point of the first "Loop End' node, this is a table of all possible Pairs of compounds with Tanimoto Similarity reported. Is this what you are after ?
To get only the unique pairs, within the loop you could filter out all results in the table using the exact match as a marker (i.e. when similarity equals 1). For example remove all results in the table that come after the exact match in each loop.
Then you would be left with unique pairs only.
Filtering with similarty will not solve the problem. Since, pairs like 1_0 and 0_1 are same and will not be filtered out based on similarity. In your example 68 molecules produce 4624 pairs, which are all possible pairs. But there will be only 2278 unique pairs, I want these.
Can you help?
Have you looked at the distance matrix calculator node? You can input a table of fingerprints and have the tanimoto similarity calulated for each previous molecule in the column you are looking at.
Attached are screenshots of the nodes/output.
Thanks Aaron and Simon,
I found another way around, by using "reference row filter" againts a list of desired subset (unique pairs) created using python (out side knime). I couldnt get a simple way though.
I know, thats what I am explaining, I dont mean simply to remove the Exact Match cpds where Similarity =1, I mean to filter out all rows which occur AFTER the similarity =1 in each Loop iteration. This way, you will end up with your x by x matrix, but only the datapoints in one half diagonal of the table (top left half). So you will be left with 2278 unique pairs
Hope that makes it more clear.
Instead of using the similarity of 1 as the point to filter from. You could also use the current variable in the loop iteration if you make the rowID the same as the CpdID.