Reaction automap bug report

I have reacently noticed some strange behavior with the reaction automap node, which seems to relate to the source of the reaction, as follows:

Reaction from a SMIRKS strings give a set of mappings, 'B'

Reaction from an RXN file without explicit mappings in the rxn file give a different set of mappings, 'A'

So, I investigated a little further....

Converting the Indigo reaction back to SMIRKS or Rxn format prior to mapping, and then converting back to Indigo format gives the same as above - i.e. the SMIRKS gives mapping B, and the RXN mapping A.

Finally, taking the RXN-sourced reaction, and running through an automap node twice - the first configured to clear mappings and the second to use 'Discard' or 'Keep' gives the mapping set B (previously only seen for the SMIRKS).

In case this makes no real sense, I've attached a sample workflow showing the behaviour.  It looks like the mapping is picking up some sort of default mapping from the unmapped RXN file (using just the atoms numbers in the RXN file perhaps?).

A second issus is that in one of these, the N of an NO2 group is not mapped to the product, and in the other, the N of a cyano is not mapped.  Ideally both would be mapped!

The reaction SMIRKS in this example (from the literature - Bioorg Med Chem Lett 2008, 798) is:

C[C@H](CO)NC1=NC(SCC2=CC=CC=C2)=NC(SC#N)=C1N(=O)=O>>C[C@H](CO)NC1=C2N=C(N)SC2=NC(SCC2=CC=CC=C2)=N1

and the corresponding RXN / .RDF is:

$RXN

1 1

$MOL

R>Mv4.0006111012022D 1 1.00000 0.00000 0

C15 H15 N5 O3 S2

25 26 0 0 1 0 0 0 0 0999 V2000

4.3436 6.5201 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

5.2036 7.0173 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

4.3436 5.4789 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0

3.4843 7.0173 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0

6.1078 6.5651 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

5.1576 8.0575 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0

5.2485 5.0257 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

3.4843 8.0125 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

6.1078 5.5239 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0

6.9681 7.0623 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0

4.5255 8.3286 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0

6.0628 8.5555 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0

5.2485 4.0305 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0

2.6250 8.5555 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

4.3436 8.5555 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

6.9681 8.1025 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

6.1528 3.5325 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

2.6250 9.5496 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0

6.9681 9.0987 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0

6.1528 2.5363 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

5.3387 1.9941 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

7.0590 2.0391 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

5.3387 0.9980 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

7.1040 0.9980 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

6.2437 0.5000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

1 2 1 0 0 0 0

1 3 2 0 0 0 0

1 4 1 0 0 0 0

2 5 2 0 0 0 0

2 6 1 0 0 0 0

3 7 1 0 0 0 0

4 8 1 0 0 0 0

5 9 1 0 0 0 0

5 10 1 0 0 0 0

6 11 2 0 0 0 0

6 12 2 0 0 0 0

7 9 2 0 0 0 0

7 13 1 0 0 0 0

8 14 1 0 0 0 0

8 15 1 1 0 0 0

10 16 1 0 0 0 0

13 17 1 0 0 0 0

14 18 1 0 0 0 0

16 19 3 0 0 0 0

17 20 1 0 0 0 0

20 21 2 0 0 0 0

20 22 1 0 0 0 0

21 23 1 0 0 0 0

22 24 2 0 0 0 0

23 25 2 0 0 0 0

24 25 1 0 0 0 0

M END

$MOL

R>Mv4.0006111012022D 1 1.00000 0.00000 0

C15 H17 N5 O S2

23 25 0 0 1 0 0 0 0 0999 V2000

10.8521 7.0439 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

9.9551 6.5060 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

11.7043 6.5060 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

10.6282 7.9861 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0

10.0011 5.5189 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0

9.2381 7.1781 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0

11.7493 5.5638 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0

12.6023 6.9990 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0

9.6411 8.0761 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

10.8521 5.0249 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

12.6023 8.0311 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

9.1039 8.9733 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0

10.8521 4.0388 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0

11.7043 8.5240 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

13.4545 8.5240 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

11.7043 3.5456 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

11.7043 9.5102 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0

11.7493 2.5577 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

10.8983 2.0645 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

12.6023 2.0645 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

10.8983 1.0334 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

12.6023 1.0334 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

11.7493 0.5394 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

1 2 1 0 0 0 0

1 3 2 0 0 0 0

1 4 1 0 0 0 0

2 5 2 0 0 0 0

2 6 1 0 0 0 0

3 7 1 0 0 0 0

3 8 1 0 0 0 0

4 9 2 0 0 0 0

5 10 1 0 0 0 0

6 9 1 0 0 0 0

7 10 2 0 0 0 0

8 11 1 0 0 0 0

9 12 1 0 0 0 0

10 13 1 0 0 0 0

11 14 1 0 0 0 0

11 15 1 1 0 0 0

13 16 1 0 0 0 0

14 17 1 0 0 0 0

16 18 1 0 0 0 0

18 19 2 0 0 0 0

18 20 1 0 0 0 0

19 21 1 0 0 0 0

20 22 2 0 0 0 0

21 23 2 0 0 0 0

22 23 1 0 0 0 0

M END

Hello,

1. Yes you are right. The current AAM algorithm may generate different results depending on a different atom order in initial molecules (if there are several matching possibilities). SMIRKS and RXN in your example produces a different atom order, and it explains all the AAM differences.

I agree that it is very inconvenience for user to have different results for the same reaction. We have a completely different issue with AAM (the time hang for a substructure search) which will be resolved after reordering atom numbers. It is very possible that the indeterminacy disappears by itself after we fix the mentioned issue.

2. The 'Keep' mode is the same as 'Discard' if there is no input mapping.

3. The second issue with NO2 group. The current algorithm ignores one atom MCS. There was a discussion about it:

http://groups.google.com/group/indigo-bugs/browse_thread/thread/011373837ba65acd

We have added a lone atoms MCS but for the more difficult examples there is no one atom matchings. Such a behavior was implemented, because it is very dangerous to map atom by atom. But If you give more examples, where one atom should be matched, I think we could consider a new heuristic.

With best regards,

Alexander

GGA Software Services

Thanks.  That makes sense.  Re. 3. - i assume that the algorithm looks for a MCS between reactant and product, and then attempts to map the atoms together which it finds.  Can it then not walk out from the MCS along bonds (ignoring bond type) and map atoms of the same type (i.e. element), stopping working down a branch once non-matching atom is found?  Maybe I have mis-understood completely how it works though!

Steve

Yes, you are right, the algorithm uses MCS as a basic principle for a mapping.

Ignoring bond type was discussed here:

http://groups.google.com/group/indigo-bugs/browse_thread/thread/011373837ba65acd

I will copy the important messages:

...the current AAM engine supports changes in bond order.
Moreover, a lot of attention was paid to this feature. An user should
set so-called REACTING CENTERS to work with a bond changes
possibility. There are several types of bonds in a reaction: NONE,
CENTER, UNCHANGE, CHANGE, MAKE OR BREAK, etc. (You can find a
description, for example, in the MDL formats help:
http://accelrys.com/products/informatics/cheminformatics/ctfile-formats/no-fee.php).
In the reaction 'CC=O>>CCO' there should be center 'CHANGE' between
carbon and oxygen (the center can be set only for reactants - indigo
AAM engine supports 'lazy' reacting centers, but in the absolutely
correct reaction centers should be set on the both sides). If no
centers are set then the engine accepts such bonds as "UNCHANGE".

Reacting centers are additional flags for bonds. It can be saved into the RXN file format (unfortunately, SMILES does not supports reacting centers). Indigo API supports them. For example, if you want to ignore bond types, the following script can be applied:

for (IndigoObject m: rxn.iterateMolecules())
         for (IndigoObject b: m.iterateBonds())
            rxn.setReactingCenter(b, Indigo.RC_UNCHANGED | Indigo.RC_ORDER_CHANGED);

But the Indigo KNIME nodes haven't got such a possibility to set up reacting centers. Therefore, the only thing could be done at the moment is the loading your reactions with predefined reacting centers from RXN files (Marvin sketcher also supports them). I think such a node with reacting centers changing would be a useful node for reactions. The only obstacle for us to implement the node is the problem with an interface, because, user should have a possibility to set reacting centers for different bonds types (not only change all bond types), but may be I am wrong, and it is unnecessary. We will be happy to receive any suggests.

With best regards,

Alexander