The substructure search works great, however sometimes I just want to run an exact structure search and this is cumbersome at the moment as I have to mark out all possible positions in the SMARTS query. Additionally I am often processing over 50,000 rows of structures, and this can be very time consuming with a substructure search, I'd imagine a node which did exact structure searching would be much quicker.
I thought about matching just smiles strings, but unfortunately the same structure is not always represented with the exact same smiles string, the ordering of atoms is sometimes different, hence I cannot use row filter matching.
The RDKit Canon Smiles node can be used to do what you want.
The node adds a column to the table with the canonical smiles for each molecule. If you use the canonical smiles of the molecule you want to do an exact structure search for as the query you can just do an exact string match in knime.
Another thing, I am finding the RDKit substructure searching magnificent, it is so so quick compared to some other substructure searching nodes I've used in KNIME. :-)
In terms of the canonical smiles, is it not possible to make the RDKit columns use canonical smiles to save doing the conversion back to smiles columns. As I have a workflow for others to use where they can do either substructure searching, and now also exact structure searching using the method you mention, but it does bring a timely delay whilst the RDKit column is first canonicalised back to smiles prior to the exact structure search. If the RDKit column was canonicalised, then that step could be removed and then my exact searching would be almost instantaneous. I hope it could be possible.
I think what you’re asking is for the ability to compare molecules directly to each other in knime to see if they are the same, right? This isn’t there at the moment, but it’s probably pretty easy to add.
Yes thats right, being able to do an exact search which pulls back the molecule in question along with its associated data in its other columns (i.e. SAR etc). It is sometimes useful to do substructure searching on the database (to which your substructure searching node works great), but there are other times when I want to take the same database and do an exact structure search to retrieve the one molecule. If there was just a tick box in the substructure search node to say whether to do substructure search or exact search that would be really useful.
To do exact searching with the present substructure search node requires me to explicitly draw hydrogens everywhere for the SMARTS string to stop it substructure searching which is cumbersome to do. Your alternative of using a row filter does work but it just adds an extra complication of canonicalising the RDKit column first back to smiles. I'd like to keep all structures in RDKit format if possible to make as user friendly as possible to others. As far as I can tell no-one has every made an exact structure search facility in KNIME yet.
I'd be very happy if it was possible for you to implement this.
I just checked in some changes that add an “Exact match” toggle to the Substructure search node. It should be available soon in the nightly build version of the RDkit nodes.
A caveat: this isn’t a true exact-structure match, since you can still have query features in the SMARTS. I think it’s close to what you want though.
 What it actually does is check to see that the molecule has the same number of atoms as the query before trying the substructure match. So the SMARTS “C[O,N]C” will generate and “Exact match” to either “COC” or “CNC”. If you don’t have query features in your SMARTS, this shouldn’t be much of a problem.