I noticed strange behavior when using the same molecules from Aromatizer and Kekulizer inputted into Descriptor Calculation-node (TPSA). Most of the time you get the exactly same result, but for some molecules the difference can be quite large (here’s a random molecule from MolPort as an example):
The molecule looks like identical in KNIME. I have attached an example KNIME workflow that produces the result here: KekuleVsAromatic.knwf (14.5 KB)
If you do the same in Python with RDKit, you get will the same result in both cases:
from rdkit import Chem
from rdkit.Chem import rdMolDescriptors
smiles = “O=C(CSC1=NN=C(O1)C1=CNC2=CC=CC=C12)NCC1=CC=CO1”
mol = Chem.MolFromSmiles(smiles)
The code to calculate TPSA uses bond order in the calculation. So the TPSA for kekulized structures will be incorrect.
thanks for the explanation and for clarifying the correct TPSA value.
I would like to point out that this is a bit hazardous feature of the RDKit Nodes in KNIME that perhaps could be improved somehow, especially as the structures look identical in the GUI and the TPSA values are only different in some cases. Luckily the Kekulized form gives an incorrect TPSA that is lower than the real one: in my case I was doing filtering based on the TPSA value using kekulized structures and only noticed this later on when somebody commented on the high TPSA values of some compounds that had passed my workflow. Often I rename the Molecule column just “Molecule” without any “Aromatized” or “Kekulized” suffixes so it took me a moment to figure out what was going on here.
Is there any particular reason you’re using the kekulized form at all?
To this point:
I would like to point out that this is a bit hazardous feature of the RDKit Nodes in KNIME that perhaps could be improved somehow
It’s worth explaining that the nodes generally don’t make any attempt to protect the user from themselves. If you choose to do something that isn’t necessarily sensible (like generating descriptors or fingerprints for kekulized molecules or generating conformations for molecules that haven’t had Hs added), the nodes will not prevent you from doing so. I think that’s a feature (because sometimes you want to do something even if it doesn’t make sense), but it does place some burden on the user to think about what they’re doing.
The original motivation for the use of Kekulizer came from that I have the pleasure working with a chemical-registration-system-that-shall-not-be-named that does not like aromatic SMILES to be like “c1ccccc1”, but instead wants them to be like “C1=CC=CC=C1”. Although for the workflow where I noticed this, there was no point of doing that I admit
Fair enough and I agree that the nodes should be flexible. Sharp knives are good tools, but they can be a bit dangerous as well. Anyhow, at least I learned something here today i.e. make sure that my molecules are always Aromatized when doing any calculations on them or otherwise funny things might happen to them sometimes.
Anyhow, at least I learned something here today i.e. make sure that my molecules are always Aromatized when doing any calculations on them or otherwise funny things might happen to them sometimes.
Fortunately having molecules in the aromatic form is the default behavior of all the RDKit nodes, so you have to actively decide to deviate from that default in order to get into trouble