For a certain use case, I am getting the "R-site numbers higher than 7 are not supported" error.
Can you increase or dynamically set the number of output R-group cells ? Even now, it would be good if the node runs to completion and outputs the max no of r-groups permitted (7) instead of throwing an error.
A related issue is that the scaffold finder node breaks a ring and shows that c-c=c fragment as part of the scaffold. The ring should not be broken
For the scaffold finder, is the ring broken because not all the rings are the same, i.e. if you have some as phenyl rings, and some as pyridine rings, then the ring is broken then as only a part of that ring is common throughout your dataset.
Personally, I find the Murcko scaffolds node much better at identifying scaffolds in a project dataset. Use the Murcko scaffolds, convert to Canonicalised Smiles, and then use GroupBy node on the Canonicalised Smiles column, and count the number. This tells you which Murcko scaffolds are most represented in the data.
So rather than getting the Maximum Common Scaffold (MCS), you get the Common Scaffolds (CS's).
In terms of the R-Sites more than 7, I've not seen that myself, but thats because I havent had more than 7. Personally, I really like the way it outputs the data as it does now, its very versatile for doing things with, such as converting it to Canonicalised Smiles and using the GroupBy nodes to look at average activities with data (so this way gives you the unique functionalities at each position). Also having the R groups in columns and canonicalising allows you to use this dataset in the excellent Erlwood node Matched Pairs Finder. This is very powerful.
Well what I was trying to do was to generate all potential R-groups along with attachment points, in pubchem data for a given scaffold. That's when i ran into an error.
Yes the node is vey powerful, but putting r-groups into columns means that if the number of R-sites varies for different molecules then there are many missing values in many columns. This problem will be avoided if there are as many rows for a given hit as the number of r-groups. The groupby etc nodes can then work easily.
One way to get around the columns with lots of empty rows would be to apply more specific substructure searches prior to the R Group Decomposition node, that way giving a more uniform output across the structures analysed.
Any remaining missing value cells can easily be dealt with. You can apply the GroupBy node, and then the aggregated missing cells just removed with a Row Filter.