I am trying to use RDKit to standardize structures and am having trouble. For example, If i have a molfile for Me2S+-O-, I want to convert it to M22S=O. I tried this for the Reaction SMARTs: [O-][S+](-*)-*>>*-S(=O)-*
The output molecule was then this R(v1)-S(=O)-R(v1) [after conversion of product to molfile). The methyls (or ethly, ...) are converted to generic R. What am I doing wrong? The RDK Product column says "Painting failed for [*]S([*]=O" but it converts ok to the molfile for R(v1)-S(=O)-R(v1) ok.
(It also looks like RDKIt does some autoconversions, e.g., PhN(=O)=O to PhN+(+O)-O- but I see that in the Molecular Sanitization section in RDKit Book.
I got the RDKIt One Component Reaction to fix the RS+(O-)R’ to RS(=O)R’ with a few caveats.
The reaction SMARTs is: [*:1][S+]([O-])[*:2]>>[*:1][S](=[O])[*:2] – You apparently need the mapping (:1, :2) to enable it to carry over the input group to output. (You can use A in place of * on reactant side (keeping * on product side) if all you want to do is aliphatic C’s bonded to S). The problem is that the output molecule has coordinates of (0,0,0) for both O and S so it looks like a mess. There is an RDKit Generate Coords node that can fix it to look better but the coordinates are totally different than the zwitterion input file. Basically it appears to do a 2D clean – this can be good or a disaster depending on the structure.
Oddly if you try to define the mapping for S and O too with [*:1][S+:3]([O-:4])[*:2]>>[*:1][S:3](=[O:4])[*:2] then it fails with the message that a product molecule could not be sanitized successfully.
If the molecule contains 2 zwiterionic sulfoxides, e.g., MeS+(O-)CH2CH2S+(O-)Et you get 2 output products MeS(=O)CH2CH2S+(O-)Et and MeS+(O-)CH2CH2S(=O)Et but not the doubly processed molecule MeS(=O)CH2CH2S(=O)Et.