New RDKit nodes and features.

Dear all,

The nightly builds of the RDKit nodes now contain some new open-source nodes contributed by NIBR:

  1. Descriptor Calculation : calculates a set of 2D descriptors that are useful for building QSAR models
  2. Salt Stripper : uses a set of standard definitions to remove salts from input molecules.
  3. Functional Group Filter : filters a set of input compounds based on counts of standard functional groups.
  4. Substructure Counter : counts the number of times a set of substructures (provided as an input table) appear in molecules

There are also nodes for reading and writing the FPS format used in Andrew Dalke's chem-fingerprints tools (http://code.google.com/p/chem-fingerprints/).

Finally, support for calculating the Avalon fingerprint has been added to the RDKit Fingerprint node.

The Avalon fingerprint was described in the following publication:

Gedeck, P., Rohde, B. & Bartels, C. QSAR − How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets. Journal of Chemical Information and Modeling 46, 1924-1936 (2006).

The implemention used here uses an RDKit wrapper around another NIBR open-source project, the Avalon toolkit: https://sourceforge.net/p/avalontoolkit/wiki/Home/

Thanks to Manuel Schwarze, Dillip Kumar Mohanty, Swarnaprava Singh, and Sudip Ghosh for their work on the Knime nodes and for Bernd Rohde for the Avalon toolkit.

I hope these are useful to the knime community,

-greg

Dear Greg (et al.)

Fantastic work - you've made my day!  : )

 

Kind regards

James

Hey Greg,

Brilliant work. I see someones been busy!

I like the Functional Group Filter! Could this be improved to include any metal as a filter in the list selection ? This can be useful to remove non-organic molecules.

Simon.

Hi Simon,

Thanks for the kind words.

The functional group definitions that are used by the Functional Group Filter node are read from a text file and you have the option to provide your own version of that (there's an option in the Configure dialog for this). You can download a "clean" copy of the default file here: http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/Data/Functional_Group_Hierarchy.txt

However, if you would like to just remove molecules that match any of a particular set of queries, I would recommend using the "RDKit Dictionary Substructure Filter" and providing the list of SMARTS that you don't want to see as input to that. That's probably somewhat simpler.

I've attached a workflow that uses this approach for a set of SMARTS queries taken from this file: http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/Data/SmartsLib/RLewis_smarts.txt

Note that I had to reformat the original file some to make it work with the knime file reader. I also only use the first set of filters that's in the file (that's why I included the Row Filter). I have attached the modified file. 

-greg