Strange Substructure Behaviour

Hi,

I am experiencing some strange behaviour with the RDKit Substructure node (And Indigo Substructure node for that matter). Or may be (more likely) I am doing something wrong.

If I draw a Pyrimidine with an A atom coming off the 4 position, and an A atom coming off the 5 position, and convert this to SMARTS (giving the string:  *-c1cncnc1-* ) and feed this into the "RDkit Substructure Filter", why does the node filter out molecules containing a Quinazoline (i.e. Pyrimidine ring with a fused Phenyl ring on the side of it at the 4,5 position), but does filter in molecules with a Tetrahydroquinazoline (i.e. Pyrimidine ring with a fused Cyclohexyl ring on the side of it at the 4,5 position).

I would expect BOTH the quinazoline and tetrahydroquinazoline to be filtered in (i.e. a match).

 

Any ideas,

 

Simon.

Hi Simon,

This behaviour is because your substructure insists on single bonds (-). If you removed these, then the substructure would implicitly match single OR aromatic. Aromatic ring bonds (even if drawn in Kekule form) will not be matched with single bond query. 

 

Kind regards

james

Hi Simon,

This behaviour is because your substructure insists on single bonds (-). If you removed these, then the substructure would implicitly match single OR aromatic. Aromatic ring bonds (even if drawn in Kekule form) will not be matched with single bond query. 

 

Kind regards

james

Thanks for that.

That makes sense from looking at the SMARTS string.

What is puzzling is that this SMARTS string is generated via the ChemAxon sketcher, so I dont understand why it does this in terms of forcing a single bond.

If this same substructure is drawn using MDL ISIS/Symyx and searched against an ISIS/Isentris database, then it retrieves both single and double bonds to an A atom.

Maybe its just different implementations and something myself and fellow chemists will need to be aware of. From your pointer, I have now working out that changing the bond from the aryl to the A atom to a "any" bond type solves the problem.

Thankyou!

Simon.

Hi Simon,

First of all, apologies for the duplicate post (from my mobile while with very patchy phone signal!)

I think this is a very common problem when using a sketcher to generate SMARTS patterns.  Personally, I would say that 'explicit' is best - ie if you draw a single bond, then a single bond is generated in the pattern, this behaviour doesn't rely on learning any rules - more unlearning some, because as you said, most chemists are more familiar with the MDL querying behaviour (which does not query by SMARTS patterns).

However if you extend the 'treat the drawn single bond as single or aromatic' behaviour to an extreme, then you would be retrieving phenyl rings when drawing cylcohexyls - which I am sure most people would not be intending!  Which goes back to why I think explicit is best when generating SMARTS from MarvinSketch - but it does mean that we (chemists) need some retraining to understand the behaviour (and benefits!)

Your solution of specifying the ANY bond type is exactly what I would recommend, as this fully represents the query you had in mind.  I would go further and say that if you only want bicyclics, then set these bonds as ring bonds in the structure as well (topology option in MarvinSketch - this should generate a @ in the pattern, representing an ANY ring bond).

 

Cheers

James 

Thanks James for your infinite wisdom at the moment, you have certainly helped me out in a few posts recently!!

This particular post relates to a question I received from a fellow chemist, so I am certainly well informed to pass on the information, and learned something myself in the process.

I totally agree with your comments that being explicit is best rather than some assumed behaviour that MDL searches undertake. At least being explicit you know exactly what you are getting.

Many thanks,

Simon.

If I might add 2 useful resources to this discussion for learning and checking ones way around SMARTS:

The excellent SMARTSviewer from the University of Hamburg:

http://smartsview.zbh.uni-hamburg.de/

and the 'theory manual' for SMARTS, from Daylight (which covers the Daylight interpretation of SMARTS, but not some of the 3rd party extensions, such as ^ for hybridisation state)

http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

Steve

Many thanks Steve for looking these up.

These will be useful to understand the actual SMARTS string depiction a bit better, and are now bookmarked for night time reading!

Thanks,

Simon.