Automated Matched Pairs node generates duplicate transformations


Sorry for duplicating, but I've had this posted for almost a month under the Erlwood topics now but to no avail, so I am taking it up a notch:

After many attempts with the Automated Matched Pairs node I'm a bit at loss here. For some reason it generates duplicate - or even triplicate- entries of transformations - and these are not the inverse transformations, since I have unchecked this option in the node configuration.

Attached a workflow using chEMBLdb hERG pIC50 data and structures, provided in the input table as Smiles.  I convert the Smiles to RDKit format, generate the matched pairs, group on the transformations, then filter out transformations that occur less than 10 times, and sort on the Count. As you can see the At-H to Me-At transformation occurs most with 265 counts but then reappears at rank 3 and 5 with 142 and 84 counts. Similar behaviour for the At-H to F-At transformation, occuring even 4 times.

Why are these transformations not grouped together? Are there specific requirements for the input structure format?

Grateful for any pointers that may solve this issue.

Many thanks/Evert

Hi Evert,

I believe that someone will look into this, but it will take a while longer I'm afraid.