Templated Conformer Generation

Hi all,

I’m currently attempting to generate conformers of a set of small molecule “building blocks” (MW < 250 Da) that are tethered by a common substructure that they all share (for example a carboxylic acid). Essentially, this is a surrogate method for pairwise flexible alignments, where both molecules are aligned by their common carboxylic acid (for example), then the other atoms within one molecule are flexibly aligned onto the other atoms from the other molecule in a way to maximize overlap. Sampling many conformers of both molecules, then choosing the pair with the most 3D overlap via downstream nodes should achieve more or less the same goal, because the flexible alignment takes a very long time (though maybe this method will not be quite as good…that’s actually what I’m trying to investigate).The “Templated Conformer Generation” node from Vernalis sounds like a great tool to do this, based on its description.

I am running into some issues using this node, though.

  1. As other posts have mentioned, it tends to crash KNIME unless I partition my input into relatively small subsets. It seems like a workaround was found in the other forum posts because the poster could just use the “RDKit Add Conformers” node, but I don’t believe that node would help in my case because it doesn’t have the option to tether all molecules to a common substructure.
  2. I’ve tried a couple of different methods of setting the “template” to just a carboxylic acid ([OH]C=O)…both having it as a RDKit mol column in the input table, or defining a Mol flow variable. In either case, I receive the error below

Execute failed: (“KeyErrorException”): null

Turning the template off altogether or using the molecule column as template and template as molecule seems to execute fine, although it’s obviously not what I’m actually trying to do.

  1. Apologies in advance if there is a simple answer to this–I’m not a programmer by any means. But in the cases from 2. where I successfully got output from the node, the “Conformers” output column seems to be just a list of SMILES strings if “RDKit” is selected as the “Conformer output format”, or text from a Mol file if “Mol” is selected. How can I convert that text output into 3D structures that I could save as a .sdf, or view in a 3D structure viewer etc? Molecule Type Cast doesn’t seem to recognize them as molecules.

Here is an example workflow:

Templated Conformer Generation Troubleshooting.knwf (59.6 KB)

Thanks,
Ryan

1 Like

Hi Ryan,

Thanks for reporting this. I will take a look at the example and see if I can figure what either you or I is getting wrong and get back to you.

Steve

1 Like

For point 3, this node outputs the results as a series of List cells.
To access the individual conformations (or energies), you’ll need to break open the lists. I’d use the Split Collection Column node for this.

2 Likes

Thanks @elsamuel - that’s definitely correct.l, although I would use the ungroup node to put reach conformer on a separate row.
Steve

2 Likes

That definitely was definitely a bug, and I can reproduce it, and also see exactly where it comes from. Thanks for reporting it - I’m astounded it has never shown up before, along with a couple of other bugs which also showed up which I had not seen previously!

The fix is on the nightly build (v1.30.1), and should make it’s way over to the other stable builds (4.3, 4.2) later today.

Thanks again,

Steve

2 Likes

Thanks @elsamuel and Steve for weighing in on point 3, and Steve for fixing the bug! I can now run the node on my full set without it crashing KNIME, and the ungroup node worked like a charm for separating conformers, if I have the template setting turned off.

For some reason, I’m still unable to generate output using the carboxylic acid as a template, though. I just get missing values for everything, even if I tweak the “Template Matching” or output settings. I even tried a different template that I know to be present in at least some of the molecules (benzene), to no avail. Is there something I’m still missing, to get all of the conformers tethered by their common carboxylic acid? I think the workflow I originally sent should still be useful for troubleshooting this, hopefully, just with an ungroup node after the conformer generation node.

Thanks,
Ryan

1 Like

Another suggestion, perhaps for a future update–It could be nice to have the node accept a SMARTS string as input as the template, in addition to actual molecules. With carboxylic acids it’s less of an issue, because it’s less common to have that substructure match undesired functional groups.But if I were to tether molecules by a primary amine, for example, I’d probably use methylamine (CH3NH2) as a “template”, but this might end up matching amides, carbamates, etc. which would be undesired.

Thanks!
Ryan

I found it you increase the threshold for the Max template RMSD to around 0.2 them you should see some output. I’m not entirely clear as to why there isn’t anything closer, but I assume it is down to the balance of forces between the bond lengths etc in the rdkit force fields and the actual constraints used to lock to the template (which are set to what I believe is their maximum strength)

Steve

Steve

1 Like

Thanks, Steve! I was able to get the output I needed by changing the Max template RMSD value.

Ryan

1 Like