If I input SMARTS into the Indigo Substructure Matcher node, then I get different results depending on whether the SMARTS are aromatised or in kekule form.
i.e. for Pyridine - aromatised (c1ccncc1) and non-aromatised (C1=CC=NC=C1).
If I use aromatised SMARTS, I get the correct results.
However, if I run it using non-aromatised SMARTS, I get no results. I have tried using the options for Resonance, but still get no results when the SMARTS are non-aromatised. If I select Tautomer, I get just 2 results (instead of 1300) and these are 2-pyridinones.
Would it be possible to allow the node to accept non-aromatised SMARTS.
Infact if I take the non-aromatic and aromatic form for the scaffold, convert Query to Indigo and run through the Substructure node I get the following results. It seems everything is handled correctly except for non-aromatic SMARTS.
If the output format from MarvinSketch was non-aromatic form and was SDF: 884 hits.
If the output format from MarvinSketch was non-aromatic form and was SMILES: 884 hits.
If the output format from MarvinSketch was non-aromatic form and was SMARTS: 0 hits.
If the output format from MarvinSketch was non-aromatic form and was MOL: 884 hits.
If the output format from MarvinSketch was aromatic form and was SDF: 884 hits.
If the output format from MarvinSketch was aromatic form and was SMILES: 884 hits.
If the output format from MarvinSketch was aromatic form and was SMARTS: 884 hits.
If the output format from MarvinSketch was aromatic form and was MOL: 884 hits.
The SMARTS format description clearly states that "C" is an aliphatic hydrogen. We can not match "C1C=CC=CC=1" to "c1ccccc1".
On the other hand, if you pass the same string as SMILES, the match will be OK. You will not be able to use $(...) fragments in that case, though. We prohibited them in SMILES because our aromaticity matcher can not handle them.
Yes, that just confirms that I wrote above: kekulized SMARTS does not match aromatic target, while kekulized SMILES does. That is a rule, not a mistake. The documentation also says:
The SMARTS "C1=CC=CC=C1" makes a pattern ("six aliphatic carbons in a ring with alternating single and double bonds") which will not match benzene.
Thanks for the quick reply. Unfortunately I'm no expert on these matters as I'm a synthetic chemist rather than computational chemist. So is the way Chemaxons MarvinSketcher portraying the SMARTS incorrectly when drawn in Kekule form ? Or if the form is correct, is it possible the Indigo translators can automatically convert these strings back into an "aromatic" carbon atom form. The trouble is chemists are getting confused when they get no results. I'm unsure what the perfect fix to this is.
The trouble with choosing Smiles format also is that if you have an Indole ring for instance, it then draws in an explicit Hydrogen, and therefore any N-1 alkylated indoles are not returned in the substructure search.
Converting SMARTS string to "aromatic" form is a not the right approach because, well, it will change the meaning of the SMARTS expression. That will confuse chemists and other users of all kinds. I would go for passing query SMILES to Indigo nodes.
Marvin Sketch saves aromatic Indigo SMILES as c1cc2ccccc2[nH]1. It does so because it thinks that you are saving a normal (non-query) molecule. It lacks the concept of query SMILES. But you can work it around by saving to SMARTS (it will give c1cc2ccccc2n1) and then Molecule Type Cast-ing to SMILES. This workaround will not make things worse for aliphatic Indole: it will be saved as SMARTS N1C=CC2=CC=CC=C12, and then you cast it to SMILES and all is good.
Also, you can export Molfile from Marvin and do not convert it to anything and pass it to "Indigo to Query Molecule" as is. That will work equally well for aromatic and aliphatic queries.
Marvin Sketch saves aromatic Indigo SMILES
I meant Indole SMILES of course.
I've found that what Dmitry suggests (ie pass the Marvin sketch out in Mol format, then QueryMol->Indigo, and into the bottom input of the Substructure Match node) works very well.
This definitely has the benefit of behaving as most chemists would expect SSS to behave.
Also, another benefit of this approach is that - if you find you are in love with SMARTS after a good browse of the Daylight docs(!) - Dmitry has recently added support for "ChemAxon extensions for SMARTS atoms" (in MOL2000) - which gives you the best of both worlds (essentially letting you sketch a query molecule in Marvin, and eg add some recursive SMARTS definitions where necessary)!
Thanks for the detailed feedback, that has certainly helped.
I had a good read of the Daylight docs, but I wouldnt say that cleared things up, I was left with a headache!
After the thoughts of both Dmitry and James, I will try and stick to MOL format, and suggest this to others to save any further confusion.