R-Group Decomposition

InsilicoConsulting · June 28, 2011, 8:05am

Hi,

For a certain use case, I am getting the "R-site numbers higher than 7 are not supported" error.

Can you increase or dynamically set the number of output R-group cells ? Even now, it would be good if the node runs to completion and outputs the max no of r-groups permitted (7) instead of throwing an error.

A related issue is that the scaffold finder node breaks a ring and shows that c-c=c fragment as part of the scaffold. The ring should not be broken

InsilicoConsulting · June 28, 2011, 10:20am

r-groups issue above may be solved if you kindly modify the node to do the following. Simon and other's please pitch in..

1. Give the option of removing trivial ones like CH-R

2. reporting R-groups in new rows rather than columns. Later one can use group by toget unique ones

The error quoted earlier happens when a large file is searched with a scaffold followed by r-group decomposition

richards99 · June 28, 2011, 5:53pm

For the scaffold finder, is the ring broken because not all the rings are the same, i.e. if you have some as phenyl rings, and some as pyridine rings, then the ring is broken then as only a part of that ring is common throughout your dataset.

Personally, I find the Murcko scaffolds node much better at identifying scaffolds in a project dataset. Use the Murcko scaffolds, convert to Canonicalised Smiles, and then use GroupBy node on the Canonicalised Smiles column, and count the number. This tells you which Murcko scaffolds are most represented in the data.

So rather than getting the Maximum Common Scaffold (MCS), you get the Common Scaffolds (CS's).

In terms of the R-Sites more than 7, I've not seen that myself, but thats because I havent had more than 7. Personally, I really like the way it outputs the data as it does now, its very versatile for doing things with, such as converting it to Canonicalised Smiles and using the GroupBy nodes to look at average activities with data (so this way gives you the unique functionalities at each position). Also having the R groups in columns and canonicalising allows you to use this dataset in the excellent Erlwood node Matched Pairs Finder. This is very powerful.

Simon.

InsilicoConsulting · June 29, 2011, 5:35am

Well what I was trying to do was to generate all potential R-groups along with attachment points, in pubchem data for a given scaffold. That's when i ran into an error.

Yes the node is vey powerful, but putting r-groups into columns means that if the number of R-sites varies for different molecules then there are many missing values in many columns. This problem will be avoided if there are as many rows for a given hit as the number of r-groups. The groupby etc nodes can then work easily.

richards99 · June 29, 2011, 7:34am

One way to get around the columns with lots of empty rows would be to apply more specific substructure searches prior to the R Group Decomposition node, that way giving a more uniform output across the structures analysed.

Any remaining missing value cells can easily be dealt with. You can apply the GroupBy node, and then the aggregated missing cells just removed with a Row Filter.

Simon.

system · April 21, 2023, 9:32pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.