I’m trying to build a good workflow in knime for optimizing geometry (my current set has mono- and di-ethers). My goal is to correctly calculate 3D descriptors for QSPR models. I don’t have a cheminformatic background. I searched through the knime forums and tried the following workflows:
In the configuration, I selected 1 conformer only per molecule as output
This is the best working workflow for now
Templated Conformer Generation (RDKit)
This looked like a great option that can do everything in one node, but it kept crashing knime with my molecules
I also tried other combinations that did not work for generating 3D descriptors (I’m using the alvaDesc node). While the two top workflows worked, the calculated 3D descriptors had different values and did not look very correlated with each other. Has anyone done something similar and can give me some advice on how to do this?
What you want to do isn’t scientifically trivial, therefore it’s also not super easy to construct a KNIME workflow that will work every time.
In order to be able to help, we need to see the actual workflow that you are using, together with the molecules that are causing problems. Given that I can take a look at it and see if I can make useful suggestions.
There was a lot of unnecessary complication in the workflow, so I simplified it to what I would do if I were trying to get an SDF cell with 3D coordinates for each molecule.
One harder-to-find change that I made is to add a random seed to the “RDKit Add Conformers” configuration and disable the UFF cleanup there (this really isn’t necessary any more).
If I were planning to generate 3D descriptors, I probably wouldn’t do the final optimize geometry step since the RDKit conformers are now pretty good on their own. That’s a matter of taste though.
Note that conformation generation can be a somewhat tricky process that doesn’t always succeed with standard settings. The current configuration here should work most of the time, but you should still assume that it could fail for some molecules. For those you will need to try playing around with the parameters until you get something that works.
Thanks a lot @greglandrum. This looks a lot better.
I still had to convert to cdk then cdk to molecule (sdf) to get the sdf column working with alvaDesc descriptors. If I use the “Rdkit to molecule” you added, I get an “Execute failed: Molecule column not found” after executing. Something’s up with that sdf column generated by rdkit. I believe I got the idea to use the cdk nodes from somewhere here on the forums.
That is very strange. I don’t think anyone’s ever reported a real problem there.
I don’t have the alvaDesc nodes, so I can’t test the hypothesis, but I suspect the problem is with that node.