Bioisosteric replacement using SMARTS (KNIME and RDKit)

Hi fine people,

I am trying to create a KNIME workflow that would accept a list of compounds and carry out bioisosteric replacements (we will use the following example here: carboxylic acid to tetrazole) automatically.

NOTE: I am using the following workflow as inspiration : RDKit-bioisosteres (myexperiment.org). This uses a text file as SMARTS input. I cannot seem to replicate the SMARTS format used here.

For this, I plan to use the Rdkit One Component Reaction node which uses a set of compounds to carry out the reaction on as input and a SMARTS string that defines the reaction.

My issue is the generation of a working SMARTS string describing the reaction.

I would like to input two SDF files (or another format, not particularly attached to SDF): one with the group to replace (carboxylic acid) and one with the list of possible bioisosteric replacements (tetrazole). I would then combine these two in KNIME and generate a SMARTS string for the reaction to then be used in the Rdkit One Component Reaction node.

NOTE: The input SDF files have the structures written with an
attachment point (*COOH for the carboxylic acid for example) which
defines where the group to replace is attached. I suspect this is the
cause of many of the issues I am experiencing.

So far, I can easily generate the reactions in RXN format using the Reaction Builder node from the Indigo node package. However, converting this reaction into a SMARTS string that is accepted by the Rdkit One Component Reaction node has proven tricky.

What I have tried so far:

  1. Converting RXN to SMARTS (Molecule Type Cast node) : gives the following error code : scanner: BufferScanner::read() error

  2. Converting the Source and Target molecules into SMARTS (Molecule Type Cast node) : gives the following error code : SMILES loader: unrecognised lowercase symbol: y

    • showing this as a string in KNIME shows that the conversion is not carried out and the string is of SDF format : *filename*.sdf 0 0 0 0 0 0 0 V3000M V30 BEGIN etc.
  3. Converting the Source and Target molecules into RDkit first (RDkit from Molecule node) then from RDkit into SMARTS (RDkit to Molecule node, SMARTS option). This outputs the following SMARTS strings:

    • Carboxylic acid : [#6](-[#8])=[#8]
    • Tetrazole : [#6]1:[#7H]:[#7]:[#7]:[#7]:1

This is as close as I’ve managed to get. I can then join these two smarts strings with >> in between (output: [#6](-[#8])=[#8]>>[#6]1:[#7H]:[#7]:[#7]:[#7]:1) to create a SMARTS reaction string but this is not accepted as an input for the Rdkit One Component Reaction node.

Error message in KNIME console :
ERROR RDKit One Component Reaction 0:40 Creation of Reaction from SMARTS value failed: null
WARN RDKit One Component Reaction 0:40 Invalid Reaction SMARTS: missing

Note that the SMARTS strings that this last option (3.) generates are very different than the ones used in the myexperiments.org example ([*:1][C:2]([OH])=O>>[*:1][C:2]1=NNN=N1). I also seem to have lost the attachment point information through these conversions which are likely to cause issues in the rest of the workflow.

Therefore I am looking for a way to generate the SMARTS strings used in the myexperiments.org example on my own sets of substituents. Obviously doing this by hand is not an option. I would also like this workflow to use only the open-source nodes available in KNIME and not proprietary nodes (Schrodinger etc.).

Hopefully, someone can help me out with this. If you need my current workflow I am happy to upload that with the source files if required.

Thanks in advance for your help,

Stay safe and healthy!

-Antoine

1 Like

SMARTS strings are very tricky. I use ChemDraw and this SMARTS tool to generate my SMARTS strings (though it can’t do reactions). It also takes a lot of trial and error and referring back to the SMARTS documentation.

In your specific case, the string [CX3:1](=O)[OX2H1]>>[CX3:1]1=NN=NN1 gets results that look decent.

[CX3:1](=O)[OX2H1] is the carboxylic acid with the carbon mapped as atom 1:

[CX3:1]1=NN=NN1 is the tetrazole with the carbon mapped as atom 1:

The mappings tell the reaction node which positions to act on.

Finding a simple automated way to do this would be great, but I haven’t come across one. I’d be curious to know if anyone else has.

2 Likes

It’s been a while since I created that workflow, but yes as @elsamuel says, draw the reaction in ChemDraw and copy as smiles.

(the other) Simon

1 Like

So on re-reading your post, what you want to do is next level sophistication - picking fn groups from a list and generating the SMARTS string on the fly.
I elected for a fixed list and vaguely remember compiling it like this:
Draw the reaction in ChemDraw, with fixed atoms represented by ‘R’.
Use the A->A (atom mapping) tool to mark the fixed atom on both sides of the reaction (You can also use the “Map Reaction Atoms” command in the Structure menu to do this automatically).


Copy as smiles gives:

[R:1][C:2](O)=O>>[R:1][C:2]1=NN=NN1

Then replace ‘R’ with ‘*’

[*:1][C:2](O)=O>>[*:1][C:2]1=NN=NN1

Exhausting, but that’s what worked.

1 Like

Thank you both for your answers, I am now a little closer to this goal.

I opened my .sdf file in Chemdraw, added R-groups where necessary, then proceeded to select all of the compounds and copying them as SMILES.
Then paste into an excel file where each compound is separated by a .. Import into KNIME, transpose, add column for the carboxylic acid SMARTS string (used the following : [R][C](=O)[OH]). Then replaced all R with *.

This now works in the One Component Reaction node and generates the required transformations.

Annotation 2020-10-14 091435

This is still a bit more work than I was initially planning and is essentially semi-automated. Although generating the lists of SMARTS reactions should hopefully be only a one-time occurrence.

Issue I still have :

  • in KNIME : Going directly from an SDF file with R-groups or attachment points to a working SMILES string (that I could then “convert” into a working SMARTS string) is tricky. OpenBabel reads the R-group as “*” which is a pain to deal with later. Molecule Type Cast node gives the SDF string as opposed to the SMILES string (this still confuses me a lot…)
  • I expect with this method I will have issues with doing replacement where there are more than 2 R groups as I am not mapping the atoms as suggested above. Would like to avoid doing this manually again…

I am still looking for a way to do this automagically from an SDF so I’ll leave this post open for a little longer.

Cheers,

Tony

1 Like

A while ago I attempted something similar, and used the following SMARTS - Google MedChemWizard

NX3H1H0,OH0[CX4,c]>>[#6:1]-S(=O)(=O)N[#6] --> sulfonamide

NX3H1H0,OH0[CX4,c]>>[#6:1]C(N[#6])C(F)(F)F --> 1-trifluoromethyl

NX3H1H0,OH0[CX4,c]>>[#6:1]C1(N[#6])CC1 --> cyclopropylamine

NX3H1H0,OH0[CX4,c]>>[#6:1]C(=C[#6])F --> fluorovinyl

NX3H1H0,OH0[CX4,c]>>[#6:1]C1(N[#6])COC1 --> oxetane

NX3H1H0,OH0[CX4,c]>>[#6:1]C1(O[#6])COC1 --> oxetane ether

NX3H1H0,OH0[CX4,c]>>[#6:1][#6]=1[#7]#6[#6] --> 1,2,4 triazole

NX3H1H0,OH0[CX4,c]>>[#6:1]c=1oc(=nn=1)[#6] --> 1,2,4 oxadiazole

NX3H1H0,OH0[CX4,c]>> [#6:1][#6]1=[#6]#7[#6] --> 1,2,3 triazole

NX3H1H0,OH0[CX4,c]>> [#6][#6]1=[#6]#7[#6:1] reverse triazole

NX3H1H0,OH0[CX4,c]>> [#6:1]c=1n=nn(n=1)[#6] tetrazole

NX3H1H0,OH0[CX4,c]>> [#6]c=1n=nn(n=1)[#6:1] reverse tetrazole

NX3H1H0,OH0[CX4,c]>>[#6:1][#6]1=N#6[#6] 1,3,4 oxadiazole

NX3H1H0,OH0[CX4,c]>>[#6][#6]1=N#6[#6:1] reverse 1,3,4 oxadiazole

NX3H1H0,OH0[CX4,c]>>[#6:1][#6]=1[#8]#6N[#6] amino oxazole

NX3H1H0,OH0[CX4,c]>>[#6][#6]=1[#8]#6N[#6:1] reverse amino oxazole

NX3H1H0,OH0[CX4,c]>>[#6:1][#6]=1[#8][#7]=#6O[#6] isoxazole

NX3H1H0,OH0[CX4,c]>> [#6:1][#6]=1#7[#6] tetrazole

NX3H1H0,OH0[CX4,c]>> [#6][#6]=1#7[#6:1] reverse tetrazole

NX3H1H0,OH0[CX4,c]>>[#6:1]C(C#N)=NO[#6] cyano oxime

NX3H1H0,OH0[CX4,c]>>[#6]C(C#N)=NO[#6:1] cyano oxime

NX3H1H0,OH0[CX4,c]>>[#6:1]C(=NC)N[#6] amidine

NX3H1H0,OH0[CX4,c]>>[#6]C(=NC)N[#6:1] reverse amidine

NX3H1H0,OH0[CX4,c]>>[#6:1]c1cnc(cn1)[#6] pyrazine

NX3H1H0,OH0[CX4,c]>>[#6:1]c1cncc(n1)[#6]

NX3H1H0,OH0[CX4,c]>>[#6]Nc=1oc(=nn=1)[#6:1] amino 1,2,4-oxazole

NX3H1H0,OH0[CX4,c]>>[#6:1]Nc=1oc(=nn=1)[#6] reversed amino-1,2,4-oxazole

NX3H1H0,OH0[CX4,c]>>[#6]Nc1=nc(=no1)[#6:1] amino-1,3,4- oxazole

NX3H1H0,OH0[CX4,c]>>[#6:1]Nc1=nc(=no1)[#6] reversed amino-1,3,4- oxazole

NX3H1H0,OH0[CX4,c]>>[#6]Nc=1n=c(on=1)[#6:1] isomeric amino-1,3,4- oxazole

NX3H1H0,OH0[CX4,c]>>[#6:1]Nc=1n=c(on=1)[#6] isomeric reversed amino-1,3,4- oxazole