loops over tanimoto matrix!!

Hi all,

 

i tried a lot to do that by nodes, but i just couldn't connect all togothere.

so i have a file from "fingerprint similarity" that contain the cmp name and a matrix of it's tanimoto similarity.

i want to go over each row=cmp and sum all the tanimoto values ( which is equal to column sum in this case) and append a new column name sum. i tried with:

chunk loop start >> math formula

but it gives the same value for all !!

 

so first how i get over each row and sum the all columns?

then i want to filter acoording to a condition, for each row :

if column[i] (Split value[i]) <0.7 continue to the next column

else if sum[cmp1] > sum[cmp2] - comparing the sum of 2 compounds

delete the row of cmp1

else delete the row of cmp2

should i write a script or there is a way to do that by nodes ?

 

Problem 1:

If you want to sum the values of all (or selected) columns within a certain row you can use the Math Formular node which will automatically append a new column. There's no need to enclose this node with loop nodes. However, it gets a little bit more complicated if the number of columns varies from case to case.

Problem 2:

I don't really understand what you mean with your filtering. You might take a look into the Rule-based Row Filter node, which is very useful when it comes to multiple conditions on which to filter. And it might be useful to use this node in combination with a precedent Math Formular node.

I think for Problem 1 (which might be the transposed of the original) might be a better option (at least for simple cases) the Column Aggregator node (though for more complex cases the Math Formula node you mentioned is the one to use). If you want row-sums for each row, the Statistics node or the GroupBy node might be a good choice.

Cheers, gabor

thank you both,

 

but if i don't use a loop, how it will sum for each row? using the math formula which seems an easy way, i have to write the column name, so if i have 1000 columns how would the expression look like?

 

my second problem was to compare the compounds and delete the more similar one, so if a tanimoto similarity >0.7 it will delete the compound which has the greater sum (of the tanimoto similarity) , that's all!!

o.k i used the "column aggregator" and managed to sum each column( row), after that i tried to use "Java Snippet Row Filter", but actually i don't know how to recall the values and compare, 
my code:

for ( row = 0;; row ++ )
       {
          for(column = 1;; column++)
          {
             if (($${Split Value}$$[row][column]< 0.7) && ( $${Split Value}$$[row][column]== 1)) {
            return true;}
        elseif (($${Split Value}$$[row][column]>= 0.7) && ($${Split Value}$$[row][column]<1)&&(sum[row] < sum[column])){

        return true;}
        
          }        
       }

* do i have to define here the row and the column and use the loop, or how i check every value in each row if it matches my condition, how to write the syntax here ???

 

HELP please :) 

 

You do not need to use the row reference. According to the Java Snippet node description something like this should work (Extract Table Dimension can help make it general):

//c_sum comes from a binding below the editor area
for (int col = 0/*or more if you want to skip some from the beginning*/;
  col < colCount /*you should know this, probably from a flow variable, could be less as you do not want the sum column*/;
  ++col) {
  Double v = getCell(col, DoubleCell.TYPE);
   if (/*...your condition with v and c_sum*/) {
      return true;
   }
}
return false;

but if i don't use a loop, how it will sum for each row?

That's the way the node works! Here is what you can find in the Math Formular node desciption:

This node evaluates a (free-form) mathematical expression based on the values in a row.

using the math formula which seems an easy way, i have to write the column name, so if i have 1000 columns how would the expression look like?

In this case the Column Aggregator node will be your friend. Make use of the Enforce exclusion/inclusion options to be prepared when it comes to different numbers of columns in different applications.