RDKit Aromatiser not working

Hi,

I am struggling to understand how the Aromatiser node is working as its not performing the desired operation I am looking for.

First, I draw in anything really, a benzene ring, a Benzofuran, etc, using kekule form, using the Marvin sketcher for instance and output to SDF.

Then take this into RDKit, using the Aromatiser node, followed by conversion back to SDF. However, I find the structure is still in the unaromatised kekule form, and not aromatised at all.

 

Help?

Simon.

Hi Simon (and Manuel, and Greg),

I have just reproduced the same issue as you.  I also checked the multi-line SDF render, and this clearly shows the bonds as single / double (ie 1's and 2's) rather than aromatic (4's)...

While I was at it, I thought I would also check the RDKit Kekulizer node - this *does* seem to work - well... almost!  What I did was took some kekulised structures, ran them through the ChemAxon MolConverter node with the 'a' option to aromatise, then ran them through RDKit's kekuliser.  Most structures were kekulised ok, but 5-membered heteroaromatics with ambiguous positions for a hydrogen (eg pyrazoles) failed with a MolSanitizeException [sanifix.py, anyone!  : )  ]

The worst bit of this was that the failure wasn't graceful - as well as not producing kekulised output for these cases, the RDKit Kekulizer node also modified the *original* incoming molecule container, so that round-tripping back using the MolConverter node (with '-a' flag) gave empty cells!!  I'm pretty-sure this shouldn't be happening(?!)

The above point aside (for now), it occured to me that it might be a good idea to support the MRV S group data that keeps track of implicit hydrogen positions when kekule forms are aromatised - for example:

M STY 1 1 DAT

		<p>M SAL 1 1 11</p>

		<p>M SDT 1 MRV_IMPLICIT_H</p>

		<p>M SDD 1 0.0000 0.0000 DR ALL 0 0</p>

		<p>M SED 1 IMPL_H1 M END $$$$</p>
		</td>
	</tr>
</tbody>

This shows that atom 11 (in a pyrazole ring in this case) should be carrying the hydrogen.

Cheers

James

There are a couple of things going on here. I'll try and get them all.

The attached workflow is what I used for testing/confirmation.

1) pyrazoles with ambiguous H positions fail and the failure somehow corrupts the input cell. The parse failure is expected (and correct), the corruption of the input cell is definitely a bug.

2) molecules that have been run through the Aromatizer node are drawn in Kekule form and produce kekulized SDF. To confirm that the molecule has actually been kekulized, switch the rendering of the column to "String" or use the RDKit->Molecule node to convert to SMILES. You'll see that you get aromatic SMILES. Similarly molecules that have been through the kekulize node produce kekule SMILES. The problem comes with SDF. The RDKit, by default, will generate SDF in kekulized form regardless of whether the input molecule is kekulized or aromatic (the CTAB spec says that aromatic form is for use only in queries). There is an option to generate a SDF in aromatic form, but it is not available in Knime. We can look into figure out some way to add this option and to see if it's possible to automatically set it if a molecule has been explicitly aromatized.

3) The S-group thing. There is a standard CTAB mechanism for specifying the number of Hs on an atom. It looks like the RDKit is not using this properly when aromatic CTABs are generated. This is a fixable bug in the backend (https://github.com/rdkit/rdkit/issues/340)

I think this covers everything...

-greg

 

 

 

I had confused myself on point 3) above: there is a standard CTAB mechanism for specifying the valence on an atom, but that doesn't help here.

Handling this is going to require an extension: either the S GRP extension James proposes or using a subset of the ZBO extensions that Alex Clark proposed. The RDKit already supports these, but they are only applied when zero-order bonds are present in molecules.

It's going to take some thought. In the meantime, you can convert these aromatic CTabs into RDKit molecules that are useable as queries using the RDKit From Molecule node, selecting "partial sanitization" in the advanced tab, and turning off "reperceive aromaticity".