Hi,
Below is how I would see an enumerator node to work which would be really great;
Essentially I would just prefer an "Enumerator" node which can take the output from the Decomposer node and enumerate.
So a scaffold(s) is input into one port of an Enumerator node which has Rx positions defined, i.e. R1, R2, R3, R4. (so basically either the scaffolds output from port1 in the decomposer node, or the scaffolds from port0 would be good templates for this).
The in a second port of the enumerator node is input a list of groups with attachment points across multiple columns (i.e. as the output is now from the decomposer node, these R Group columns would be ideal). Then within the Enumerator node you then say which R group column you want to match up to the R1 on the scaffold, which R group column you want to match up to the R2 on the scaffold etc.
By going this way, you will be able to restrict a list of R groups to only go on to one position of the scaffold (i.e. Ortho position of a ring as InSilico mentions). Because if you have R1 defined at the ortho position of a ring on the scaffold, then you choose which column of R groups you want to be assigned within the node. This sounds an ideal implementation of an "Enumerator" node to me. Also if you dont assign a list of R groups to one of the Rx positions of the scaffold, then this position remains as-is (i.e. non-enumerated) and can therefore be enumerated at a later point etc.
What is also needed is an easy way to generate attachment points within a molecule, as not all groups will be coming from a Decomposer node. For example you may have a list of alkyl bromides from Aldrich etc and you want these to be used for enumerating with, so need a way of converting the Bromo to an attachment point for instance. What is needed is a Clipper node, where you can choose where the reaction is to take place and thus gives an output of the molecules with attachment points. The Transformation node that Mikhail describes sounds ideal. I hope it can be used to define Rx groups for a scaffold too, so for example, being able to take Toluene, and then specifying the introduction of an R1 group at the ortho position to the methyl.
On the other point around how the Decomposer node manages non-matching scaffolds, the option to skip molecules that do not contain the matched scaffold would be ideal. Rather than generating a null value as In Silico suggests, I would prefer an extra outport on the Decomposer node which passes all non-matches to here. I think generating a null cell in the standard output complicates manipulations further on as these would need to be filtered out. But I do agree with InSilico that it would be good to capture the list of non-matches somehow so you know how many were not captured.
Be interested to hear on further comments.
Thanks,
Simon.