Scaffold Finder node bug


I am using the Scaffold Finder node which generates the MCS in IndigoQueryMol format. When I convert this to Smiles, the Smiles seem to be non-compliant thus causing issues.

For instance, if I take these Smiles and convert them back to Indigo with "Molecule to Indigo " and then try and convert back to Smiles again with "Indigo to Molecule"all the conversions fail with the following error;

ERROR IndigoSaverNodeModel Could not convert molecule with RowId=Row1#9: element: can not calculate implicit hydrogens on aromatic N, charge 0, degree 2, 0 radical electrons

When I look at the Smiles string before this error, a lot of atoms are separated by a colon :

This doesnt seem to be handled very well. Attempts to use the "Hydrogen Adder" node also fails with these Smiles strings.

I have been using the node as you showed in the UGM Tutorial to create a list of common scaffolds, this is most useful and neat little way of doing it. I just wish the smiles were behaving themselves for further manipulation!

Any thoughts,


Dear Simon,

I have reproduced the bug. The Scaffold Finder node generates query molecules. Say, if you have two input structures: 'c1ccc2[nH]ccc2c1' and 'c1cc[nH]c1', the scaffold is 'C1:N:C:C:C:1' ( A colon : means an aromatic bond).  You can use the query scaffold in other nodes (RGroup Decomposer, Substructure matcher etc). But the query structure 'C1:N:C:C:C:1'  can not be converted directly to a molecule, thus, e.g. you can not calculate canonical smiles from it. An exception is araised for the stucture: 'element: can not calculate implicit hydrogens on aromatic N, charge 0, degree 2, 0 radical electrons'. 

First thought is explicitly specify implicit hydrogens (take it from input structures) for the query. For the given example the 'C1:[NH]:C:C:C:1' solves the valence issue. But it is not a best way, because, say, if you have three input stucutres: 'c1ccc2[nH]ccc2c1' , 'c1cc[nH]c1' and 'Cn1cccc1' you wish the 'C1:N:C:C:C:1' as a scaffold (because 'C1:[NH]:C:C:C:1' does not match  'Cn1cccc1')

The second solution is more accurate. You can add the 'Dearomatizer' node. The following workflow should generate canonical smiles for the given examples without errors:

'MCS Scaffold Finder'->'Indigo To Query'->'Mol To Indigo'->'Dearomatizer'->'Indigo To Mol'

The problem with incorrect valence errors is known for other nodes. The 'Molecule To Indigo' node does not verify a valence corectness. We are thinking about adding the 'consider molecules with valence errors as incorrect molecules' option for the 'Molecule To Indigo' node. Also, we are going to add a 'Standardizer' node with options for repairing valence and for replacing common types of functional group transformations ( Therefore, user will be able to add the 'Standardizer' node to the second output port (the port with incorrect molecules) and fix the molecules valence errors by default methods.

Best regards,