Murcko Scaffold Generation

Hi Dmitry,

Is it possible to have a "Murcko Scaffolds" node with some advanced features. Murcko Scaffolds is a good way of gathering together common cores or fragments in an SAR and can be really useful to check for diversity and for clustering.

Is it possible to have some advanced options such as:

- Discard any unsubstituted rings, i.e. if any ring is only monosubstituted then it must be terminal, and therefore may well not be part of the key fragment. So an example of this would be a cyclopropyl ring on an aromatic core, in this case it really is a substituent rather than an integral part of the scaffold.

- Discard unsubstituted rings which are x membered rings (i.e. this is useful for removing cyclopropyl and cyclobutyl rings which are often used as substitution groups rather than part of scaffolds acting as spacers). By default, Murcko Scaffold selection would include these 3 and 4 membered rings and can skew the diversity of the fragments.


Thanks,

Simon.

Simon,

Murcko scaffolds seem very doable, but could you please point me to the exact definition of what Murcko scaffold really is? The only one I was able to figure by playing with RDKit node is:

Leave there only the rings and the chains that connect them. Also, leave there double bonds that are attached to the chains that connect the rings. Remove everything else.

Does it look complete to you?

Also, I am not sure that I understood the "Discard any unsubstituted rings" option properly. For example, if we have biphenyl (c1ccccc1-c1ccccc1), then will we discard it entirely as both its rings are single-subsituted? If we have terphenyl (c1ccccc1-c1ccccc1-c1ccccc1), will we discard both side rings as single-substituted, and then the remaining ring, as it has become unsubstituted?

 

Best regards,

Dmitry

Hi Dmitry,

There is a good definition of Murcko scaffolds in their paper;

"Bemis, Guy W.; Murcko, Mark A..  The Properties of Known Drugs. 1. Molecular Frameworks.    Journal of Medicinal Chemistry  (1996),  39(15),  2887-2893"

As far as I understand it, a Murcko scaffold retains every ring system within the molecule. All non-ring systems are removed from the molecule unless the non-ring system is required to connect two ring systems together, any sp3 branches off the retained non-ring system(s) are removed also, so you are left with a straight connecting chain (if any branches off the straight chain are via a double bond (sp2), then the first atom is kept, so things like a carbonyl within the chain is kept, and the first carbon atom of a double bond is kept). Also any features within the straight chain which connects rings together are kept, such as double and triple bonds. Pretty much as you describe.

There are versions out there which only retrieve the all carbon chain, however, I think it would be best to keep all the heteroatoms within the framework, and have this "all heteroatoms to carbon" option in the "feature remover" node which you may have seen in my feature remover request :-).

In terms of "Discard any unsubstituted rings" option, thinking about it some more after your comments, I think this option would be too brutal and lead to too much of the molecule being removed. I believe the second option to "discard any unsubstituted rings of ring size x" is still a very useful option, particularly when x = 3 or 4. The reason for this is that the standard Murcko scaffold option is really useful for small fragments and hits in an early phase project when the molecules are small (Mw<200) and relatively unfunctionalised, however, as a project matures and molecules increase in size (Mw around 400) they become highly functionalised, and groups like CycloPropyl and CycloButyl start to appear where chemists are using them to replace an Aryl Methyl for Aryl Cyclopropyl, or replace a OMe group for a OCycPr group for instance to reduce metabolism. Most chemists would see this groups simply as functionalities rather than an integral part of the scaffold/framework, and therefore an option to be able to remove these cyclopropyl and cyclobutyl rings when they are only monosubstituted would be a very useful addition.

I hope this makes sense.

Thanks,

Simon.

Simon,

You can try the Murcko Scaffold node in today's build. It has the options you requested to discard 3- and 4-rings that are terminal. Please tell us if the results are not that you would expect.

 

Best regards,

Dmitry

Wow, that is amazing Dmitry. Thanks for delivering this so quickly and it works absolutely perfectly.

I am so happy you have added the options to remove terminal 3 and 4 membered rings, this also works exactly as expected. These options really improve the useability of Murcko Scaffold analysis on highly functionalised datasets of molecules.

This is really really useful for looking at the number of variations in scaffolds on a project, how well exemplified some of these scaffolds are, and what the best activity is thats been achieved.

The way I am using it is taking a dataset of Indigo Molecules and SAR data, using Murcko Scaffolds with remove terminal 3+4 membered rings, then Indigo to Canonicalised Smiles, then GroupBy node on the canonicalised column and in the aggregation section choosing to count by MoleculeID, and selecting Activity column with Minimum aggregation option, finally a sorter node and sorting in descending order on the MoleculeID count. This lists the most exemplified Murcko scaffolds at the top along with best activities, and number of compounds made with the scaffold. This is so powerful.

The only extra feature which would another layer of powerfulness to the analysis is the one I mentioned in a different post (Feature Remover Request) around converting all the Heavy Atoms into the same atom type.

Thanks so much for this, and doing it so quickly!

Simon.

Hi Dmitry,

I noticed a slight error in the Murcko Scaffold implementation.

If you have a SO2 group attached to an aromatic ring, then it is not removed in the Murcko Scaffold when it should be. For example Benzene-SO2Me becomes BenzeneSO2H, when it should in fact be Benzene. This applies to SO3, OSO2R and NO2 also.

Also Pyridine-N-oxides should be just Pyridines in the Murcko scaffolds.

Is it possible to fix these minor errors, other than this, everything else is working perfect :-)

Simon.

Simon,

Thank you for the report. That will be fixed. Another small question: how do you think, should O=S1(=O)O[Mn]O1 be reduced to [SH4]1O[Mn]O1 or left as is?

Best regards,

Dmitry

Hi,

I would have said left as it is, because the =O's on the sulphur are attached to a cyclic four membered ring. Likewise, something like O=S1(=O)CCCCN1 should have the =O's kept as they are within a ring.

If there is a SO2 in a chain, then the whole chain should be removed, unless it connects two rings together. If it does connect two rings together, then the =O's of the Sulphur are kept like as in amide, and like as in the first carbon of an alkene, because they are double bonds.

Hope this is all making sense!

 

Simon.

Simon,

Yes, it does make sence, thank you.

You can try the corrected version tomorrow morning.

Best regards,

Dmitry

Thanks Dmitry for the quick changes, the Sulphone groups are now handled correctly.

There is a slight oddity with the Pyridine N-oxide treatment, if the Pyridine N-oxide is drawn in its charged form (so where the bond is represented as N+---O-) then the oxygen atom is correctly removed, but it leaves the N in a positively charged state. Is it possible for the N+ to be rectified to a uncharged Nitrogen, as otherwise it is being represented as four valent nitrogen with a Hydrogen present.

Apologies for another change request, but I believe that everything is working as expected!

Thanks

Simon.

Hi Dmitry,

Thanks, I noticed that you update Indigo yesterday and now the positive charge on the pyridine has been removed in the Murcko Scaffolds (when it was an N-oxide pyridine before the node) so all the output in Murcko Scaffolds appears correct. Visually the resultant molecule looks correct (from the Pyridine N-Oxide going to just Pyridine), but I think something is not quite right in the coding as if you then use the "Indigo to Molecule" node, it returns an error saying "ERROR IndigoMoleculeSaverNodeModel Could not convert molecule: SMILES saver: unsure hydrogen count on atom #18" relating to this structure which has come from a Pyridine N-oxide structure. If I run it through the "Valence Checker" node, I get this error for the same structure "element: can not calculate implicit hydrogens on aromatic N, charge 0, degree 2, 0 radical electrons".

I hope this is not too difficult to fix :-)

 

Simon.

Simon,

Thank you for the observation. That is an internal Indigo problem related to aromaticity. We will try to fix it; but you can work it around by passing the structures through the Dearomatizer node before the Murcko node (and then through the Aromatizer if you want to restore aromaticity).

Best regards,

Dmitry

Simon,

The bug is fixed in today's build (1.0.0.0000965).

Best regards,

Dmitry

Brilliant, I was about to send you a post to say thanks, I noticed you'd fixed it in this mornings build. You'd be glad to hear the Murcko Scaffolds node is working brilliantly and now completely bug free.

This is really useful to quickly understand a new SAR as well in quickly seeing what the common scaffolds and features are, it saves me alot of time. This is great.

 

Thanks

Simon.