Automatic atachment A/R sites

Is there a way to use the Indigo nodes to attach attachement/R sites on molecules? This functionality of adding R and A arises frequently if one were to construct combinatorial libraries.

For a fragment database containing several thousand data, it is simply not possible to do it by hand.

Software that generate combilibs however do require SMILES strings with A and R defined.

Hi,

There is a request I have asked for the Indigo nodes on a few occassions, there is a post somewhere. I believe it was put down as a possible future addition, but I agree it  is something that would be very useful.

Likewise the RDKit nodes cannot do this either.

All I can tell you is that it is possible to do this with the MOE nodes using an R-Group Clipper and then an enumerator node. http://www.knime.org/files/09_CCG.pdf

Its also possible to do this with the Symyx Accelrys nodes too using the Enumeration node. http://accelrys.com/products/informatics/cheminformatics/

Hope this helps.

Thanks,

Simon.

InsilicoConsulting,

In what way do you want to automatize automatic adding A/R sites to the molecule? You need some rules to define where the attachement should appear. How do you such rules should be specified?
In near future we will add Tranformation node that will generalize Atom Replacer node, and this node will have a tranformation reaction as a parameter to transform input set of molecules. For example, for your case it will be possible to define the following reaction: N1C=CC=C1>>[*]N1C=CC=C1 to attach * to a specific atom. Is this what you need?

With best regards,
Mikhail Rybalkin
GGA Software Services LLC

Hi Guys,

One brute force method which seems to work is to do a substructure search using the fragments without the attachment points followed by a r-group decomposition on  a VERY large file of bioactivity/drug like molecules.

If there is a substructure match [as is very likely] the scaffold column in the indigo nodes gives the fragments with attachment points [R]. Thus "naturally ocurring" handles/attachments can be found for the fragments.

However in some cases there is a substructure match, but no R attachment points, which is fine.

The problem is that for some fragments the r-decomposition node throws an error with no scaffold/core embeddings obtained! It would be nice if the workflow were to ignore this error and go to the next fragment.

Note that the above logic is running in a loop. Mikhail please note the above para?

best regards

 

 

This error you mention is something I brought up previously also, I agree, I just wish it would ignore this and continue to the next molecule.

The only way round this is to apply a substructure node first before the decomposition node.

Simon.

 

Hi InsilicoConsulting,

Your approach to attach "naturally occurring" R-sites is interesting. I see a problem here that r-group decomposition node finds only the first possible decomposition, but there might be different "naturally occurring" R-site positions. Could you explain your ideal view of how these “naturally occurring” R-sites can be added? If the only problem is to ignore exception then we will add such option. Also thanks to richards99 for proposing solution for that.

Maybe you can propose some better way using nodes that needs to be implemented? Do you need to get all possible r-group decompositions? Do you need to merge these different r-group decompositions in order to get all "naturally occurring" R-sites within one molecule?

With best regards,
Mikhail Rybalkin
GGA Software Services LLC

Hi Mikahil,

My guess is ignoring the error should be enough, perhaps using a "null" value instead of skipping, so that one can later identify which queries failed. That's why i would also love to have the id of the query table in the output in port 1.

Currently one can obtain R, R1, R2 on the queries from the scaffold port [port 1] of the r-group decomposition node. I dont use the scaffold column in the 0 port. 

I wonder if it is desirable to output multiple rows for each molecule-attachment point combination? So a scaffold will not have S-R,S-R1,S-R2 in 1 cell but have 3 cells with S-R,S-R1,S-R2? This may give more flexibility and control in constructing a library using a tool like http://gecco.org.chemie.uni-frankfurt.de/smilib/. There can be an option in the preferences to do both or 1 of the above decompositions.

Others with more chem. experience may be able to comment on the above.

 

Hi, InsilicoConsulting

OK, we will add an option for skipping molecules that do not contain specified scafold.

I didn't catch the point of multiple rows per each molecule. Could you explain what would you be able to do with multiple rows per each molecule output, that you cannot do with current multiple columns per each R-site per each molecule output? Where would you pass such output?

With best regards,
Mikhail 

Dunno if it makes sense, but if i want to generate a combi library using smilib etc, then I may want to restrict the number of combinations that smilib makes in the final library by using fragment with only a particular r-sites to begin with. So one may want say only the R site at the ortho position and not others.

So if there are multiple r sites on a fragment/ring then these can be further decomposed into individual rows. I can then choose ones that I want using the row filter before generating the combilib .

Perhaps others can weigh in on this?

 

Hi,

Below is how I would see an enumerator node to work which would be really great;

Essentially I would just prefer an "Enumerator" node which can take the output from the Decomposer node and enumerate.

So a scaffold(s) is input into one port of an Enumerator node which has Rx positions defined, i.e. R1, R2, R3, R4. (so basically either the scaffolds output from port1 in the decomposer node, or the scaffolds from port0 would be good templates for this).

The in a second port of the enumerator node is input a list of groups with attachment points across multiple columns (i.e. as the output is now from the decomposer node, these R Group columns would be ideal). Then within the Enumerator node you then say which R group column you want to match up to the R1 on the scaffold, which R group column you want to match up to the R2 on the scaffold etc.

By going this way, you will be able to restrict a list of R groups to only go on to one position of the scaffold (i.e. Ortho position of a ring as InSilico mentions). Because if you have R1 defined at the ortho position of a ring on the scaffold, then you choose which column of R groups you want to be assigned within the node. This sounds an ideal implementation of an "Enumerator" node to me. Also if you dont assign a list of R groups to one of the Rx positions of the scaffold, then this position remains as-is (i.e. non-enumerated) and can therefore be enumerated at a later point etc.

What is also needed is an easy way to generate attachment points within a molecule, as not all groups will be coming from a Decomposer node. For example you may have a list of alkyl bromides from Aldrich etc and you want these to be used for enumerating with, so need a way of converting the Bromo to an attachment point for instance. What is needed is a Clipper node, where you can choose where the reaction is to take place and thus gives an output of the molecules with attachment points. The Transformation node that Mikhail describes sounds ideal. I hope it can be used to define Rx groups for a scaffold too, so for example, being able to take Toluene, and then specifying the introduction of an R1 group at the ortho position to the methyl.

On the other point around how the Decomposer node manages non-matching scaffolds, the option to skip molecules that do not contain the matched scaffold would be ideal. Rather than generating a null value as In Silico suggests, I would prefer an extra outport on the Decomposer node which passes all non-matches to here. I think generating a null cell in the standard output complicates manipulations further on as these would need to be filtered out. But I do agree with InSilico that it would be good to capture the list of non-matches somehow so you know how many were not captured.

Be interested to hear on further comments.

Thanks,

Simon.