Compound sanitization different between Knime Node and RDKit in Python

Hi everyone,

I am trying to standardize my compounds with KNIME and RDKit. I am using a Python node and the MolVS library.
Now first I am generating RDKit molecules from SDF. In the node (RDKit from Molecule) I did tick everything in the Advanced Tab, apart from keep hydrogens. (So partial sanitization, reperceive aromaticity and correct stereochemistry). Except for one Molecule there is no error. Now the Molecules are sent to the Python node where the MolVS library is called. The first call the library makes when standardising is Chem.Sanitize(mol). Now here I get a failure for some molecules.
I would expect that sanitization in the node and the base library should be the same, hence if I already sanitized the molecules before with RDKit they should not fail later on.
Most errors relate to kekulization, some are related to valences (but to be honest these are quite weird molecules with Ru, Os or similar so here I would expect that it probably does not work)

Or is there something I am missing?
Also it would be good to know which sanitization I should trust more. Can I just ignore the errors related to aromaticity for standardisation of the library?

It would be great if someone could help me out here. I can also provide the failed molecules if that helps.

Best,
Jennifer

I am using Knime 3.6.2,
And Python 3 with conda:
conda version : 4.5.11
conda-build version : 3.16.2
python version : 3.7.0.final.0
RDKit 2018.09.1.0
molVS 0.1.1

on Ubuntu 16.04 LTS

1 Like

Hi,
It’s probably best to not do the “partial sanitization” here and just take the default behavior from the node.

I am still a bit surprised by this behavior, is it possible to share a molecule that is causing the problem?

-greg

Hi Greg,

sorry for taking so long, but I was quite busy and of course I did not find the examples quickly. I had a look and again found the issue.
I have attached a workflow (I need the sdf conversion since I still have the bug where Knime crashes when I open a table with a RDKit Mol column). One is a compound from the COSMOS DB and the other example occurs when reading in DrugBank.
I am aware that a lot of the Molecules where this error occurs is quite uncommon complexes and metals but few seem to be quite normal.

It would be great to know if I maybe misinterpreted something?

The specs from above apply, the RDKit integration is: 3.4.0.v201807311105

Thanks!

Jennifer

PS: the workflow and data can be found here:
https://drive.google.com/drive/folders/1Iuco8b4Idk5klVJSazMPCYcDXY7o7x9v?usp=sharing

Hi Jennifer,

Thanks for the additional information. I will try and take a look at this over the next day or so and let you know what I find.

-greg

Hi Jennifer,

I took a look and here’s what’s going on:
The general problem is that the molecule has a Ca2+ ion with 6 bonds to it. Here’s the sketch of that from marvin:


The RDKit doesn’t like this non-chemical valence for Ca and will, by default, reject the molecule:

In [17]: m = Chem.MolFromMolFile('C-827.sdf',sanitize=False,removeHs=False)

In [18]: Chem.SanitizeMol(m)
--------------------------------------------------------------------------- 
ValueError                                Traceback (most recent call last)
<ipython-input-18-8aabfab76642> in <module>
----> 1 Chem.SanitizeMol(m)

ValueError: Sanitization error: Explicit valence for atom # 0 Ca, 8, is greater than permitted

The failure happens during the sanitization step that calculates atomic valences. You can see this here:

In [19]: m = Chem.MolFromMolFile('C-827.sdf',sanitize=False,removeHs=False)

In [20]: m.UpdatePropertyCache()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-52730fcc012b> in <module>
----> 1 m.UpdatePropertyCache()

ValueError: Sanitization error: Explicit valence for atom # 0 Ca, 8, is greater than permitted

You get the same problem when standardizing the molecule with MolVS because the first step that it does by default when validating the molecule is to call Chem.SanitizeMol().

If you process the SDF normally in KNIME (with the default options for the RDKit from Molecule node), you see the same error. If, however, you enable partial sanitization in the RDKit from Molecule node, the UpdatePropertyCache() method is called with an argument telling it to ignore valence errors; that allows the rest of the sanitization steps to continue and the molecule is successfully processed. Calling MolVS with the molecule, however, will still lead to an error since it still calls the normal version of SanitizeMol().

In essence, if you’re planning on using MolVS in its default mode, you might as well go ahead and use the default options for the RDKit from Molecule node.

You mention in the earlier message that you also encountered some problems with aromaticity. If you can share one or two of those I may be able to diagnose those as well.

-greg

Thank you for taking time to look into this.
I guess I might have misinterpreted the partial sanitization option. From your explanation it sounds like ticking this option allows the sanitization process to skip failed steps? I always thought enabling this lead to a more strict sanitization.

I will see if I find another Molecule.

Best,
Jennifer

Yeah, it allows you to skip steps in the sanitization. The standard (default) behavior is to be maximally strict.