Molecule to CDK node bug

Hi, when taking a structure as SDF with aromatic rings which are represented in aromatic form as opposed to kekule form, the double bonds are lost when using the molecule to CDK node, it simply gives a cyclohexyl ring.

Simon.

Yep. I have also encountered this while filtering a library using molecular properties and converting back from cdk->mol . I thus create a new CDK column so that the original sdf/smiles colum is retained till the end.

 

CDK team, kindly address this bug asap, if not already done in the updated cdk release.

 

thanks in advance.

Hi,

that's a well reported problem in CDK and is currently being discussed on the developers mailing list. I would love to see a good solution for this. Technically, values bond type 4 through 8 are for SSS queries only and should not occur in SDFiles to indicate aromaticity.

Whenever CDK encounters a bond of type 4, it sets the aromaticity flag but treats the bond as single bond. Hence the behaviour you have observed. Of course, the bond order should either be single, double, or unknown. In any case, to represent the molecule and work with it, it is necessary to deduce the bond orders.

I have added a method to the Molecule to CDK node, including the corresponding option in the node dialog, to deal with up to seven-membered rings. I.e., make sure the aromaticity doesn't get lost. Since the method adds some overhead to the process, I would advise to use this option only if you are aware of the presence of molecules with bond type 4.

Another 'hack' would be to use the OpenBabel node to convert the non-standard SDFiles into standard SDFiles. OpenBabel automatically transforms those SDFs into a standard Kekule representation.

Please let me know how the extended Molecule to CDK node performs.

Best regards,

Stephan

Hi Stephan,

Many thanks for this, six membered rings are converted well, as are fused 6,6 systems too. However, I have noticed 5 membered rings and 6,5 fused systems are not converted, and are left with single bonds.Can these be accomodated please?

 

Simon.

Hi Simon,

I tried to improve the situation and have modified the library. It is far from perfect! I still recommend either to avoid "aromatized SDFs" or convert the input SDFs with OpenBabel.

The problem has to do with the CDK core library. For the "bond deduction tools" to work the molecular structures need to be fully and correctly configured. However, configuring the molecules requires a correct structure with clearly defined bonds, which we don't have ... I really hope that at some point someone will have the time to come up with a Kekulization implementation that doesn't rely on implicit valence and hybridisation.

Anyway, I will upload the code tonight and I would appreciate feedback. :)

Stephan

Hi Stephan,

As OpenBabel is now bundled with KNIME, converting the aromatised SDFs will be more straightforward. But many thanks for the further modifications, it is certainly more accurate than the last build, this is certainly useful. In terms of what still doesnt work;

- triazoles, tetrazoles, benzimidazole, benzopyrazole, benzotriazole, pyrrolopyridine, pyrrolopyridazine which are unsubstituted on the Nitrogen are still not parsed correctly, and pyrrolopyrazine N-methylated is not parsed correctly.

Also, it seems that even with the kekule experimental box left unticked, the node now seems to be processing molecules much slower than before.

Thanks,

Simon.

Hi Simon,

excellent, thanks for the detailed feedback. That's very useful.

Hmm, I am a bit concerned that the conversion appears to be slower than normal. I will monitor that.

Stephan

Also a 6:7:6 compound has the double bonds removed.

I was only using CDK to calculate formal charge but I have now had to delete this whole section due to this bug.

 

thanks

 

Louisa

 

 

Hi Louisa,

can you please share the molfile/SMILES in question? This speeds up the debugging process.

Many thanks,

Stephan

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.