James,
Thanks for the feedback.
Let's see if I can make effective use of this fine tool to try to answer some of your questions.
I find that I get some horrible geometries using MMFF if my input structures already have 2D coordinates, but they are ok if I pre-convert to SMILES - if this is a general issue then perhaps a "remove coordinates" or "randomise starting coordinates" option would be useful?
I haven't tried that particular experiment, but I guess it's trying to use the 2D coordinates as a starting point. That probably is asking too much from the optimizer. Out of curiousity: are the geometries from the 2D coordinates converged? (There's a column with that information)
Having an option to "remove starting coordinates"and start with a free set of 3D coordinates is a great idea. We will add that.
It would be good to be more explicit about what is happening with hydrogens - at the moment it appears that the output geometry does not have explicit hydrogens; but I presume they were present for the optimisation(?). Maybe there should be an option to remove hydrogens from output structure (defaulting to False)?
Whichever atoms are present in the input structure are used for the optimization. So if Hs are present, they will be used. This points to the need for "AddHs" and "RemoveHs" nodes, I think. That and/or an option to add Hs to molecules before doing the 3D optimization (and then removing them afterwards before creating the output).
I would quite like to use MMFF to generate sets of conformers - do you think this should be accomplished with the current node in a loop (if the 'randomise' option above were present), or would it be best to add a 'Conformer generation' node - where eg max number of conformers, max iterations, energy threshold vs minimum energy, RMS difference for inclusion options could be set? I prefer the idea of a dedicated node! : )
I agree that we should have a dedicated node for this.
Where does the Generate Coords node (3D) now stand? Is this redundant, or is it a good precursor node to ensure reasonable start-points prior to MMFF minimisation? My observation (cf point 1) is that if I run 2D structures through this node first, then I get good minimised structures.
As you've observed: it's a good way to generate an initial 3D geometry.
Final (general) point - I think the RDKit nodes may now be a victim of their own success, in that there are so many of them it is getting a bit difficult to keep track! I think some sub-categories may be a useful addition now - particularly for new users(?)
Good idea! I'm open to suggestions as to what that organization should look like.
-greg