I have just ran a set of compounds through the RGroup Decomposition node, and first impressions are that it works well. However, I ran into problems where there were possibilities of different stereoisomers in the core substitutions - for example (R)-phenylalanine, (S)-phenylalanine and rac-phenylalanine - using the core SMARTS 'NCC(=O)O', the R-Group decomposition is the same for all three of these - i.e. the stereochemistry is not indicated.
Worse still happens if you add in alpha-methylphenylalanine - in which case only one of the R-groups on the alpha-carbon makes it into the output table at all.
Is there a way around this (simpler than splitting my data into unsubstituted, R-, S- and rac- tables and handling separately), or is it unexpected?
There are two parts to the answer:
The fact that you get the same answer from the R group decomposition for the R-,S-, and rac- molecules is expected. How would you expect the sidechains to be labelled to indicate that they were attached to a center with a particular stereochemistry? One important point as you're thinking about how to get around this is to realize that the RDKit does not currently use stereochemistry when doing a substructure search. This makes it difficult to use RDKit nodes to split your data into multiple tables.
The behavior of the R-group decomposition with alpha-methylpheylalanine is a bug. The RDKit is actually generating the right results, but the knime node isn't putting them in the table correctly. This is definitely fixable.
Thanks. I had realised that the substructure search was ignoring stereochemistry. Some suggestions - the Mol (and related) format has bond stereochemistry as part of the bond attribute - can this information be included in the output structure for the attachment bond?
Also, and this might help with the second part of the problem too, if each proton position on the core query is treated as a separate possible R-group (instead of, as I think happens at present, each atom in the SMARTS is treated as an R-group source), then the stereochemistry would be captured - in my example, R- would have R2a as -H and R2b as -CH2Ph, and S- would have R2a as -CH2Ph and R2b as -H (and the quarternary example would then have R2a as Me and R2b as CH2Ph, or vice versa, depending on the stereochemistry). Of course, that leaves as question about the rac- example, and how that would then be handled... maybe an option in the R-Group decomposition to use the current method (thereby ignoring stereochemistry) or this method (thereby including stereochemistry), with a 'fail' output port for those which could not be assigned definitively?