A couple of new nodes

greglandrum · October 30, 2013, 5:59am

Dear all,

Last night we added a couple of new nodes to the RDKit collection, both focused on 3D molecular structures:

Optimize Geometry: This allows you to clean up the 3D structure of a molecule. The node provides access to the three force-fields the RDKit supports: UFF, MMFF94, and MMFF94S.
Open3D Alignment: This allows rigid alignment of molecules to each other using automatically determined atom mappings. The algorithm is described in the publication: P. Tosco et. al. JCAMD 25:777-83 () [http://dx.doi.org/10.1007/s10822-011-9462-9]

Both nodes take advantage of the really nice work that Paolo Tosco did for the most recent version of the RDKit.

Please try out the new nodes and let us know what you think!

-greg

James_Davidson · November 1, 2013, 12:59pm

Hi Greg, Manuel, Paolo, et al!

Thanks for the new nodes - I think these are a great addition to the cheminformatics / modelling tools available in KNIME.

I have a couple of comments / questions about the Optimise Geometry node:

I find that I get some horrible geometries using MMFF if my input structures already have 2D coordinates, but they are ok if I pre-convert to SMILES - if this is a general issue then perhaps a "remove coordinates" or "randomise starting coordinates" option would be useful?
It would be good to be more explicit about what is happening with hydrogens - at the moment it appears that the output geometry does not have explicit hydrogens; but I presume they were present for the optimisation(?). Maybe there should be an option to remove hydrogens from output structure (defaulting to False)?
I would quite like to use MMFF to generate sets of conformers - do you think this should be accomplished with the current node in a loop (if the 'randomise' option above were present), or would it be best to add a 'Conformer generation' node - where eg max number of conformers, max iterations, energy threshold vs minimum energy, RMS difference for inclusion options could be set? I prefer the idea of a dedicated node! : )
Where does the Generate Coords node (3D) now stand? Is this redundant, or is it a good precursor node to ensure reasonable start-points prior to MMFF minimisation? My observation (cf point 1) is that if I run 2D structures through this node first, then I get good minimised structures.
Final (general) point - I think the RDKit nodes may now be a victim of their own success, in that there are so many of them it is getting a bit difficult to keep track! I think some sub-categories may be a useful addition now - particularly for new users(?)

Kind regards

James

greglandrum · November 2, 2013, 7:14am

James,

Thanks for the feedback.

Let's see if I can make effective use of this fine tool to try to answer some of your questions.

I find that I get some horrible geometries using MMFF if my input structures already have 2D coordinates, but they are ok if I pre-convert to SMILES - if this is a general issue then perhaps a "remove coordinates" or "randomise starting coordinates" option would be useful?

I haven't tried that particular experiment, but I guess it's trying to use the 2D coordinates as a starting point. That probably is asking too much from the optimizer. Out of curiousity: are the geometries from the 2D coordinates converged? (There's a column with that information)

Having an option to "remove starting coordinates"and start with a free set of 3D coordinates is a great idea. We will add that.

It would be good to be more explicit about what is happening with hydrogens - at the moment it appears that the output geometry does not have explicit hydrogens; but I presume they were present for the optimisation(?). Maybe there should be an option to remove hydrogens from output structure (defaulting to False)?

Whichever atoms are present in the input structure are used for the optimization. So if Hs are present, they will be used. This points to the need for "AddHs" and "RemoveHs" nodes, I think. That and/or an option to add Hs to molecules before doing the 3D optimization (and then removing them afterwards before creating the output).

I would quite like to use MMFF to generate sets of conformers - do you think this should be accomplished with the current node in a loop (if the 'randomise' option above were present), or would it be best to add a 'Conformer generation' node - where eg max number of conformers, max iterations, energy threshold vs minimum energy, RMS difference for inclusion options could be set? I prefer the idea of a dedicated node! : )

I agree that we should have a dedicated node for this.

Where does the Generate Coords node (3D) now stand? Is this redundant, or is it a good precursor node to ensure reasonable start-points prior to MMFF minimisation? My observation (cf point 1) is that if I run 2D structures through this node first, then I get good minimised structures.

As you've observed: it's a good way to generate an initial 3D geometry.

Final (general) point - I think the RDKit nodes may now be a victim of their own success, in that there are so many of them it is getting a bit difficult to keep track! I think some sub-categories may be a useful addition now - particularly for new users(?)

Good idea! I'm open to suggestions as to what that organization should look like.

-greg

James_Davidson · November 4, 2013, 7:39am

Hi Greg (and Manuel),

Here's a first go, then! To pay homage to the original RDKit method names, I have forced myself to use "z" rather than "s" where necessary - currently fictitious nodes are marked with (*). I should also say that I have conformed to node names starting with "RDKit", and if this is a convention I have also suggested a couple of existing node renames (that would also make them fit better with the underlying toolkit):

Converters
- Molecule to RDKit (rename "RDKit From Molecule"?)
- RDKit To Molecule
- InChI to RDKit (rename "RDKit From InChI"?)
- RDKit To InChI
- IUPAC to RDKit (rename "RDKit From IUPAC"?)
- RDKit From PDB(*)
- RDKit To PDB(*)
- RDKit Canon SMILES
Modifiers
- RDKit Add Hs(*)
- RDKit Remove Hs(*)
- RDKit Aromatizer(*)
- RDKit Kekulizer(*)
- RDKit Salt Stripper
Calculators
- RDKit Descriptor Calculation
Geometry
- RDKit Generate Coords
- RDKit Optimize Geometry
- RDKit Generate Conformers(*)
- RDKit Open 3D Alignment
Fingerprints
- RDKit Fingerprint
- RDKit Fingerprint Reader
- RDKit Fingerprint Writer
Fragments
- RDKit Molecule Fragmenter
- RDKit Find Murcko Scaffolds
Searching
- RDKit Substructure Filter
- RDKit Molecule Substructure Filter
- RDKit Dictionary Substructure Filter
- RDKit Functional Group Filter
- RDKit Substructure Counter
- RDKit Diversity Picker
Reactions
- RDKit One Component Reaction
- RDKit Two Component Reaction
Viewing
- RDKit Interactive Table
- RDKit SMILES Headers
- RDKit Highlighting Atoms
Experimental
- RDKit R Group Decomposition

I'm sure there are lots of other ways of naming / categorising - but this made some sense to me at this point in time!

Kind regards

James

system · April 21, 2023, 9:11pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.