RDKit Descriptor Node - Rings bug

Hi,

 

The RDKit Descriptor node says there're 8 rings with this molecule - C1C2=CC=C(CC3=CC(CC4=CC=C(CC5=CC=CC1=C5)C=C4)=CC=C3)C=C2

However, I only see 5.

 

The same error occurs with RDKit in Python, using this code:

 

from rdkit import Chem
 
mol = Chem.MolFromSmiles('C1C2=CC=C(CC3=CC(CC4=CC=C(CC5=CC=CC1=C5)C=C4)=CC=C3)C=C2')
 
ring = mol.GetRingInfo()
 
print "Mol as smiles: C1C2=CC=C(CC3=CC(CC4=CC=C(CC5=CC=CC1=C5)C=C4)=CC=C3)C=C2"
 
print "Number of rings: " + str(ring.NumRings())
 
print "Number of atoms: " + str(mol.GetNumAtoms())
 
rings = ring.AtomRings()  
 
for ring in rings:
    print str(ring) + "  ring length: " + str(len(ring))
 

 

 

And it gives this output:

 

Mol as smiles: C1C2=CC=C(CC3=CC(CC4=CC=C(CC5=CC=CC1=C5)C=C4)=CC=C3)C=C2
Number of rings: 8
Number of atoms: 28
(2, 3, 4, 26, 27, 1)  ring length: 6
(7, 8, 23, 24, 25, 6)  ring length: 6
(11, 12, 13, 21, 22, 10)  ring length: 6
(16, 17, 18, 19, 20, 15)  ring length: 6
(0, 19, 20, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)  ring length: 18
(2, 3, 4, 5, 6, 7, 8, 9, 10, 22, 21, 13, 14, 15, 20, 19, 0, 1)  ring length: 18
(26, 27, 1, 0, 19, 20, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4)  ring length: 18
(26, 27, 1, 0, 19, 20, 15, 14, 13, 21, 22, 10, 9, 8, 7, 6, 5, 4)  ring length: 18

 

Thanks in advance,

Ed.

Hi Ed,

This molecule is a classic example of a case where smallest-set-of-smallest-ring algorithms generate unexpected (but not wrong) results.

The "extra" rings come about because of the symmetry of the structure.

I've attached two images demonstrating this. The first has the fifth ring that I guess you expected to see highlighted. The second has one of the rings that you probably didn't expect to see highlighted. There are three more of these unexpected rings possible by symmetry.

Does that help explain what's going on?

-greg

 

 

 

Hi,

 

Yeah after speaking with a few others after this I've come to see that ring perception is actually subjective - probably why the word "perception" is used. 

 

At least it's not a bug, though the way I'm searching for rings is in a different sense - I'm trying to count the number of rings in the SMILES format's perspective.  Although I could simply read the highest digit present in the SMILES string, I often don't have the SMILES on me (due to a certain issue I found in CDK some molecules with > 9 rings don't yield SMILES).  Thus, due to this formatting error I needed a way of detecting the number of rings to avoid this SMILES formatting issue.

 

TBH my issue has been resolved anyway but thanks for clearing that up :)

 

Ed.