RDKit Reaction Smarts

Hi everyone, probably a bit of a newbie question here, but I am having great difficulties understanding reaction SMARTS within the RDKit Two Component Reaction Node.


There doesn't seem to be an online resource that details common org chem/med chem transformations in SMARTS format.  If there is such a thing, could someone share it with me?  I realise that sometimes customised SMARTS might be needed, but there must be a fairly standard list of uncomplicated reactions that could be very generally described.


I wish to do some "simple" in silico enumeration of a library of amides based on two input file sets....obviously one collection of primary and secondary amines (filtered using the RDKit Substructure Filter with the string [NX3;H2,H1] and the other a couple of carboxylic acids which are entered manually via the MarvinSketch --> Molecule to RDKit --> RDKit Substructure Filter ([CX3](=O)[OX2H1])


I would be very grateful if someone could help me out with this....the online SMARTS tutorials are a bit baffling to me at the moment, and I would really like to start mastering this process.



Hello Alastair,


I find translating a molecule into SMARTS is an art in itself. Anyway, here is the daylight ressource:


and their tutorial:





Try using these SMARTS tools ...http://www.biosolveit.de/SMARTStools/index.html?ct=1 . I am not affiliated to the company.

Is it not easier to use the Marvin Sketcher to draw the reaction transformation and output it from the node as RXN SMARTS. That way you dont need to learn any SMARTS :-)


Simon, that might be a good way to do it....but what notation do you use for the substituents on the amine?  Do I use the R-groups in Marvin or the AH substituent or the Markush types?  Or someother? 

And which output format should it be...my Sketch node doesnt seem to have a SMARTS output (although it does output as RxnCell or SmartsCell).  Can I configure it as a flow variable?

I have drawn the reaction and mapped the atoms where the new bond should be formed, but still no luck.

I tried saving the rxn output from MarvinSketch to a file and then reading that into the RDKit 2 component reaction node, but still no success (although maybe that is due to atom types?)

Still a bit baffled here!


Hi again Simon,

I managed to re-instate the third input port on my RDKit Two Component Reaction Node....now I have trouble with sketching out my reaction correctly in Marvin Sketch - output is set to RxnCell as it should be.....can someone correct my reaction scheme (see below?) with regards to the correct R group notation that should be used?



You should be able to just put carbon there, and then if you map the carbons from amine sm and the carbons on the amide, it should work it all out for you.



I made it simple for myself in the first instance,  1 amine plus 1 acid to yield 1 product, and it still returns an empty table.....I mapped the atoms as you suggested (see attached)

I assume that I do not have to map the atoms in the input files?

Workflow screenshot also attached.

Thanks for any help anyone can offer!


Apologies try this below, the example I described earlier would only work for secondary amines.. Below will work for any amines.

FOr the reaction transformation, simply draw acetic acid + ammonia going to the corresponding acetamide . Now map the carbon atoms in the sm and product, i.e. the Me from acetic acid to the Me in the amide and the N on the ammonia to the N on the amide. You shouldnt need to draw the fully elaborated acid like you have in your scheme. This mapping is telling the transformation to look for the substituents attached to the mapped atoms and transfer across to the product mapped atoms.

This works fine for me, I have the Marvin sketcher to output as SDF for the amines and acids and then use the Mol to RDKit node like you have prior to the Rxn Transformation node. The Marvin sketcher for the reaction transformation is set to out as RXN. There is no need to add mappings to your acid and amine input files.

I hope this helps


Hi Simon,

First off, thanks for your help.  Very much appreciated!  I finally got the chemistry to work using the ideas and mappings you suggested...I started a totally fresh workflow, and whilst I am not sure that made any difference, it seems that KNIME sometimes has bad memories of failed nodes.  All very metaphysical....


Anyhow, the reaction mapping works fine it seems......unless there is another carboxylate group in the molecule, even as an ester.  In that case I seemed to generate all manner of crossover products.  Obviously there are workarounds, such as turning one of the acid/esters into an acid chloride and doing two stages....or doing some SMILES based filtering on the output.



You can get round this with the ester crossover.

in Marvin for the reaction transformation you drew, goto the advanced atom button in there which brings up the periodic table, now go to the advanced tab, and select H+, now go to your O atom on the acid and click on it twice until it shows (h1). This tells it the O atom must contain exactly one hydrogen. This therefore stops the ester problem.

To get around two acids is more tricky. If part of the substructure will distinguish them, then draw more of the acid structure instead of just the acetic acid, mapping up all the new atoms in the sm and product. This way it will only transform the acids to an amide which match this substructure.

hope all that makes sense.


For some reason, when I did as you described (setting total number of attached hydrogens to exactly one "H1") the node ignored this and made ester --> amide products anyway.


However, when I modified the atom with the substituent count Atom Query Properties to "s*", this solved the problem, and took the substituent as drawn.  Bit strange, but then for some reason MarvinSketch will let you designate an ester oxygen as having up to three hydrogens attached.