Common Scaffold Detection

Hi,

A node I'd really like to see is a scaffold detection node which goes beyond what a MCS algorithm can achieve.

The trouble with MCS, is often resulting in tiny fragments identified which is quite useless. Even splitting SAR up into clusters and running the MCS over the different clusters still often results in little improvement in the MCS scaffolds detected. Reason for this is that a scaffold only needs to change in terms of one heavy atom (i.e. pyridine versus phenyl) which often is similar enough to reside in the same cluster, and thus results in the MCS shrinking right down in heavy atom size.

What I would like is a node which can analyse a large dataset and identify commonly exemplified fragments/scaffolds (which are at least 6 heavy atoms) in the dataset and report these to the output. The key here is "commonly exemplified", not "always exemplified".

I appreciate this is probably not an easy task, but I'll request it anyway!

Simon.

Hi Simon,

if you have access to a node that is able to calculate a MCS for a couple of molecules, couldn't you emulate the desired behaviour by splitting your dataset into many random subsets, calculating an MCS per subset and then ranking the detected substructures by their frequency?

Also, it might be interesting to look at the MoSS-node. It will generate all substructures that occur with a certain minimal frequency in your dataset.

Best regards,

Nils

Hi Nils,

I did do as you mention with the Indigo MCS but I dont have any success in generating reasonable substructures. The likely reason for this is as I explained in my original post, that if you have a scaffold system highly represented in a cluster, it only takes one molecule which is an aza version of this scaffold system and then the resultant MCS is essentially meaningless just consisting of a couple of atoms. Ideally, a node is needed which recognises these common patterns in an SAR of molecules.

I have never used the MoSS node so I will take a look at this.

Thanks

Simon.

See http://tripod.nih.gov/ws/rgroup/rgroup.jnlp. 

I find this very useful. Gives scaffolds, r-groups and avg. and deviation in properties. One can thus choose a scaffold where a particular property has lowest mean and variation.

As Nils mentioned the MoSS node (contained in KNIME Chemistry Add-Ons) is worth a look. It searches for frequent fragments in a set of molecules. You can tune how frequent a fragment should be (e.g. 100% gives you the MCS of all molecules) and you can even search for discriminative fragments that occur "at least x%" in class A and "at most y%" in class B.

Hi Simon,

the important point here is to split the input dataset into multiple subsets. The single aza compound will be only in one of those chunks, and all other chunks should only contain the unmodified scaffold system. Thus, this will be found for all other chunks and be quite frequent when you count in the end.

Best regards,

Nils

Many thanks all for your responses.

Amazingly I had somehow overlooked using the MoSS node and this is exactly what I want to do, this is really useful.

Many thanks for pointing me to this.

Simon.