SMARTS query features in SDF cell -> Query Molecule to Indigo

Dear All,

I have been spending a bit more time with the Indigo nodes, and have come across some possible issues (or at least some suggestions for possible improvements) with retaining query features across into Indigo query mols.

Of course, there is every possibility that I am going about this the wrong way, and that what I am trying to do is already catered for!  : )

 

What I am trying to do is the following:

1.  Sketch a scaffold into ChemAxon's MarvinSketch node - including some recursive SMARTS atoms

2.  convert the scaffold to SDF type with MRV->SDF (the SMARTS atom info gets retained in MRV property blocks)

3.  convert the SDF to an Indigo query mol using the QueryMol->Indigo node

4.  use the resulting query mol in the R-Group Decomposer

 

Currently the SMARTS atoms get 'dumbed-down' to just their parent element (I think) - which would make sense if the QueryMol->Indigo node knows nothing about the MRV blocks, and just takes info from the CTAB.

At the moment in KNIME, the MarvinSketch node seems to be the most available/useful way of sketching inputs for the purposes of structure querying.  In principle, would it be possible to get the Indigo nodes to understand the MRV blocks in SDF (and if so, would the SMARTS atoms be handled ok in the Indigo query mol)?

 

Any help / thoughts / advice greatly appreciated!

Kind regards

James

James,

Thank you for this feedback, it is quite interesting. You are absolutely right: Indigo currently skips the "M  MRV" Molfile blocks. In your case, these blocks contain recursive SMARTS expressions. Supporting these (non-standard) SMARTS blocks should not be very hard; we can plan it for one of future releases of Indigo.

At this time, why if you try saving your query molecule as SMARTS? I am not sure about the MarvinSketch node, but the desktop Marvin Sketch app has that option -- I just checked. Also, you can read a SMARTS file from disk with the File Reader node, and then pass the resulting table through the Molecule Type Cast node to convert it to SMARTS type, and then feed the SMARTS table to "Query Molecule to Indigo" node. That should do the trick.

 

Best regards,

Dmitry

Thanks Dmitry,

I was keen to keep the whole process inside of the KNIME workflow if at all possible - to make it easier for chemists here to use as an interface - hence not wanting to manually save the .sma out of the MarvinSketch node to read back in.  Having said that, chances are that most 'general users' won't be defining recursive SMARTS atoms in their substructure sketches!  Also, I guess a better solution would be for ChemAxon to have a MrvToSmarts node added...

Anyway, I took your advice and tried saving the query molecule as .sma and read back in.  The Typer node sets this to SMARTS ok; but when I then try to feed it into the 'Query Molecule to Indigo' node I see the following errors:

ERROR Query Molecule to Indigo Execute failed: SMILES loader: '#' is allowed only within SMARTS queries

or

ERROR Query Molecule to Indigo Execute failed: SMILES loader: '$' fragments are allowed only in SMARTS queries

 - seemingly dependent on what character is found first.  An example SMARTS string (straight out of MarvinSketch) is:

[#6]!@-[N;$([N]c)]!@-C=O

Any ideas what I am doing wrong?

Kind regards

 

James

James,

Hmm, that should not happen. Could you please check that:

  1. You have the latest version of the Indigo nodes (1.0.0.0000937)
  2. You have your Molecule Type Cast node set up to produce SMARTS and not SMILES

and get back to me? I just checked -- it works without errors on my installation. More than that, it should not print errors into the console: it normally sends the erroneous input to the second output port instead...

 

Best regards,

Dmitry

Hi Dmitry,

I was on version 935, but have now updated to 937.  The Molecule Type Cast node is definitely set to SMARTS.  I still see the same issue.

If I simplify the SMARTS input to [C]!@-[N]!@-C=O things proceed ok.

I have just tried the original SMARTS on a colleague's machine - and it runs as expected!  I am going to trouble-shoot the problem like a true IT professional, and "turn it off and back on again" (!)

I will post back in a bit to let you know how things resolve.

 

Kind regards

James

Well the good news is that the problem WASN'T solved by re-booting my PC (I hate it when that happens!)

Having explored a bit further, I can reproduce the error with the original SMARTS if I select the 'Append Column' option in the 'Query Molecule to Indigo' node.  If I un-select this, the conversion proceeds as expected.

 

The bad news is that if I feed the Indigo Query molecule into the R-group decomposer scaffold input, I now see the following error:

ERROR R-Group Decomposer Execute failed: molecule substructure matcher: unexpected 'fragment' constraint

Not sure what this means...

 

Kind regards

James

James,

Ahhh, I should have figured that. Thank you very much for finding the bug with the "Append Column" option. This has been fixed, and the new version (1.0.0.0000939) is available now.

 

The "unexpected 'fragment' constraint" was the sign that the core Indigo R-Group decomposer has not been through much testing with SMARTS patterns (we primarily focused on Molfile patterns). I fixed that too, thank you.

Not to update core Indigo twice, I also added support for Marvin's "SMARTS in Molfile" extension, so now you are supposedly able to pass the molecule through MarvinSketch->SDF->Query Molecule to Indigo chain without losing the SMARTS notation.

 

Best regards,

Dmitry

Hi Dmitry,

Thanks so much for your rapid action on this - above and beyond the call of duty for a weekend!

I have looked a bit more at the SMARTS handling, and have found that everything almost(!) works as I would expect...  To test, I am using Table Creator -> Molecule Type Cast -> Query Molecule to Indigo -> R-Group Decomposer (template port).

Strangely, the following is fine for defining a template:

[$([C;H2][#6])]!@-[$([N]a)]!@-[$([C]a)]=O

However, the minute I try to specify that the carbonyl C must be attached to a para-substituted aryl - I get an error:

[$([C;H2][#6])]!@-[$([N]a)]!@-[$([C]aaaaA)]=O

ERROR R-Group Decomposer Execute failed: R-Group deconvolution: no embeddings obtained

For comparison, I put the same SMARTS through the RDKit R group decomposition node, and things behave exactly as I would expect.

Thanks for adding the MRV SMARTS block parsing - this is really cool (for completeness, I should say that I reproduce the error above going via MarvinSketch as well...)

 

Kind regards

James

James,

The "no embedding obtained" error means that the given scaffold was not found in at least one of the given structures. To avoid this error, you can filter the structures with the "Substructure Matcher" node prior to feeding them to the "RGroup Decomposer" node. Give your scaffold as a query for the Substructure Matcher node (just by drawing another arrow). In the second output port of the Substructure Matcher, you will see the structures that do not contain the scaffold.

 

Best regards,

Dmitry

James,

Here are two links that may be relevant to the problem:

http://tech.knime.org/forum/indigo/r-group-decomposition
http://tech.knime.org/forum/indigo/bug-with-substructure-matcher-of-aromatised-molecules

 

Best regards,

Dmitry

Hi Dmitry,

Sorry - I should have realised this...  I was actually using the Substructure Matcher prior to the R-Group node; but had forgotten that I was passing a slightly different substructure in than the one describing my template.

Passing the same substructure in means everything now works as expected - either via manually-typed SMARTS; or via MarvinSketch->SDF !!

Once again, thanks for all the help!

 

Kind regards

James

Hi James,

Just wanted to check that you realised that in the Marvin Sketcher node, there is a  tab in the Marvin Sketcher window called "Output Options" at the top, here you choose to change structure output, and select "SMARTScell" from the list.

This means the output from the sketcher is in SMARTS format straightaway.

Hope this saves you using extra translator/converter nodes.

Simon.

FYI:
If you need to change the default output type of MarvinSketch node, you can change it in File > Preference > KNIME > Marvin.

Best,
Taka
 

Brilliant, that is very useful to know in how to change the default output format.

Simon.

@Simon - thanks for the heads-up about the 'Output' tab.  At the time I had originally posted I was unaware of this useful addition to the Marvin Sketch node; but stumbled across it a few days later!

@Taka - Thanks for the info about the preferences setting - this is a nice extra to a very nice node!

 

Kind regards

James