Is it possible to create an RDKit node which will take an input of RDKit structures in a column, and a second input of scaffolds represented as SMARTS (in which the scaffolds are drawn with Rx numbers around the scaffold) and the node analyses each molecule in the RDKit column and then identifies the group at each Rx position and puts it into a new column called Rx as a RDkit column, and then finally another column with the scaffold but without the Rx numbers on it, again as an RDKit column.
This will be very powerful to spot trends and find average activities etc, on different groups. Ideally the groups in each RDKit Rx column are canonicalised. The groups can also be used for substructure searching too if they are as RDkit columns.
I appreciate this has already been done by MOE, but its very slow to process and does not allow you to specify the positions of substitution with a name (i.e. R1 etc.). This is an issue as slightly different scaffolds can have groups represented in Rx columns at completely different positions so it becomes impossible to compare between scaffolds.