Is there any mechanism to detect and correct formal charges of an input molecule in SD format?
To clarify my question I have prepared a toy example. Attached to this post you will find an sd file with two molecules. They are the _same_ molecule, where an N which belongs to a ring is charged +1. My problem relies on the fact that in the first molecule the charge is correctly specified at the end of ctab, with the line:
M CHG 1 2 1
but in the second molecule, this information is not specified
Summary:
Molecule1:
The charge information is correctly set with the annotation: M CHG 1 2 1
Molecule2:
The charge information is not set.
When I read this molecules the InChI identifier is different, therefore they are recognised as different structures.
Is possible to detect those situations and fix them ?
there is currently no mechanism in KNIME-CDK to detect incorrectly assigned charges and correct them. In principle it should be possible to infer the correct charge by looking at the number of covalent bonds of the atom in question, but there are other factors that complicate the picture (tautomerism, pi-bonds, etc.).
If your input molecule is erroneous, your only way out is to use the sketcher to correct it manually.
thanks for your message. I'll give a try to your idea of implementing a check to detect and fix the wrong formal charges according to the number of covalent bonds.
Just for curiosity, do you know any implementation of this mechanism (formal charge correction) in other software? I'm sorry if I am wrong but I have the perception that it is not an extremely complicated functionality but it is not usually covered by chemical libraries. Is that correct ? or maybe I have misunderstood something ?
It's not extremely complicated, but one doesn't generally have sufficient information to be able to do it correctly.
Assuming that the octet rule should apply to the molecule, you need 3 of the following four pieces of information and then you can calculate the last :
formal charge
number of attached hydrogens
heavy-atom valence (sum of all bond orders to non-hydrogen atoms)
number of radical electrons
Typical chemistry files (i.e. SMILES and MOL) assume that everything is specified except the number of attached hydrogens.