If I input SMARTS into the RDKit Substructure Search node, then I get different results depending on whether the SMARTS are aromatised or in kekule form.
i.e. for Pyridine - aromatised (c1ccncc1) and non-aromatised (C1=CC=NC=C1).
If I run it using non-aromatised SMARTS, I get no results.
However, if I use aromatised SMARTS, I get the correct results.
Would it be possible to allow the node to accept non-aromatised SMARTS.
I'm afraid it's not quite that simple. SMARTS is a query language, so the concept of applying aromatization to it isn't chemically reasonable.
Capital letters in SMARTS mean "aliphatic", so the "non aromatized" smarts C1=CC=NC=C1 is actually querying for 6 aliphatic atoms connected by altertnating single and double bonds,
For an overview, it's worth reading section 4.7 of the SMARTS theory manual: (http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html)
One solution would be to provide a checkbox you could use to tell the node to treat the input as SMILES instead of SMARTS. This would provide the behavior you're looking for, but would mean you could no longer use query features.
Thanks for the information, it appears the concepts of SMILES and SMARTS are not totally intuitive. After having some discussions with others on this forum, it was mentioned that having the query molecule in MOL format eliminates this confusion. Is there any possibility of allowing the RDKit Substructure node to accept MOL format for a SSS query.
Is it possible to have the substructure node to have two input ports instead of one. So one for the dataset and one for the substructure query.
The trouble with the existing setup is having to use variables to put in a substructure is not user friendly for the novice KNIME user, and also using variables removes the "column type" of the data such as "smiles" or "smarts" etc.
Problems are occuring where a chemist passes smiles into the RDKit substructure node via variable import (instead of smarts). This poses problems when the structure is an unsubstituted indole for instance as the aromatic smiles force an explicit hydrogen. This then causes the substructure output to only have indoles which are unsubstituted at the nitrogen which confuses the chemist user.
An ideal scenario would be for the second in-port of the node to NOT accept smiles so as to avoid this confusion, and only accept SMARTS, MOL, and SDF as input.
Is this possible ?
Sorry I'm so slow to reply. I'm better at staying on top of mailing lists than forums.
Having a substructure search node that accepts queries from an input port is a great idea. We can also think about adding the possibility to provide a mol file to the standard substructure search node (the same way we added rxn file support to the reaction nodes).