I am trying to re-create multi-fusion similarity maps (see link below for details).
http://onlinelibrary.wiley.com/doi/10.1111/j.1747-0285.2007.00579.x/abstract
The method requires each column in the query compound library to be compared to each column in a reference compound library (column by column) before moving on to the next column in the query compound library. There are about 10 thousand columns in each compound library but the number of columns are not exactly the same. Each column contains 1000 binary fingerprint keys in separate rows. For each such comparison, a Tanimoto Score is calculated. This essentially means that a loop-within-loop construct is neccessary. I've tried to use the "Column List Loop Start" to loop through each column in the query compound library and channel its output into the input node of the "Fingerprint Similarity" node (Erl Woods Chemoinformatics). In the "Fingerprint Similarity" node, I have set the flow variable "columnName" to "currentColumnName". The "Multi-query fusion" option was ticked. The reference compound library was channeled into the "Fingerprint Similarity" node without any looping node attached to it. Therefore, the end result is that the workflow loops through every column in the query compound library but only compared it with the first column from the reference compound library. Even then, the loop ended with an error message: Loop End (Column Append) Execute failed: Java heap space. Also, I've tried attaching a corresponding "Column List Loop Start" node to the reference compound library but it only looped in tandem with the other one. I also noticed that the Multi-query fusion did not seem to have any effect.
Here are my specific questions:
- How do I create the double-looping structure so that every column in one compound library can be compared in a pairwise manner with every column in the reference compound library?
- How do I collect all the Tanimoto score from each iteration into 1 output file instead of just getting 1 line containing the Tanimoto score?
- How do I use the "Muti-query fusion" option correctly?
Thanks!