Scaffold Finder node bug

richards99 · February 5, 2012, 3:22pm

Hi,

I am using the Scaffold Finder node which generates the MCS in IndigoQueryMol format. When I convert this to Smiles, the Smiles seem to be non-compliant thus causing issues.

For instance, if I take these Smiles and convert them back to Indigo with "Molecule to Indigo " and then try and convert back to Smiles again with "Indigo to Molecule"all the conversions fail with the following error;

ERROR IndigoSaverNodeModel Could not convert molecule with RowId=Row1#9: element: can not calculate implicit hydrogens on aromatic N, charge 0, degree 2, 0 radical electrons

When I look at the Smiles string before this error, a lot of atoms are separated by a colon :

This doesnt seem to be handled very well. Attempts to use the "Hydrogen Adder" node also fails with these Smiles strings.

I have been using the node as you showed in the UGM Tutorial to create a list of common scaffolds, this is most useful and neat little way of doing it. I just wish the smiles were behaving themselves for further manipulation!

Any thoughts,

Simon.

asavelyev · February 20, 2012, 11:25am

Dear Simon,

I have reproduced the bug. The Scaffold Finder node generates query molecules. Say, if you have two input structures: 'c1ccc2[nH]ccc2c1' and 'c1cc[nH]c1', the scaffold is 'C1:N:C:C:C:1' ( A colon : means an aromatic bond). You can use the query scaffold in other nodes (RGroup Decomposer, Substructure matcher etc). But the query structure 'C1:N:C:C:C:1' can not be converted directly to a molecule, thus, e.g. you can not calculate canonical smiles from it. An exception is araised for the stucture: 'element: can not calculate implicit hydrogens on aromatic N, charge 0, degree 2, 0 radical electrons'.

First thought is explicitly specify implicit hydrogens (take it from input structures) for the query. For the given example the 'C1:[NH]:C:C:C:1' solves the valence issue. But it is not a best way, because, say, if you have three input stucutres: 'c1ccc2[nH]ccc2c1' , 'c1cc[nH]c1' and 'Cn1cccc1' you wish the 'C1:N:C:C:C:1' as a scaffold (because 'C1:[NH]:C:C:C:1' does not match 'Cn1cccc1')

The second solution is more accurate. You can add the 'Dearomatizer' node. The following workflow should generate canonical smiles for the given examples without errors:

'MCS Scaffold Finder'->'Indigo To Query'->'Mol To Indigo'->'Dearomatizer'->'Indigo To Mol'

The problem with incorrect valence errors is known for other nodes. The 'Molecule To Indigo' node does not verify a valence corectness. We are thinking about adding the 'consider molecules with valence errors as incorrect molecules' option for the 'Molecule To Indigo' node. Also, we are going to add a 'Standardizer' node with options for repairing valence and for replacing common types of functional group transformations (http://tech.knime.org/forum/indigo/common-group-transformations). Therefore, user will be able to add the 'Standardizer' node to the second output port (the port with incorrect molecules) and fix the molecules valence errors by default methods.

Best regards,

Alexander

system · April 21, 2023, 9:32pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.