PostreSQL + RDKit Cartridge + Knime.

Has anyone used the PostgreSQL+RDKit Cartridge plugin via Knime?


I'd expect submitting queries and getting back SMILES strings would be easy enough; but what about retrieving postgresql/RDKit mol data types as Knime RDKit cells?



Nodes 'molecule type cast' and 'Rdkit from molecule' should help.


I use the PostgreSQL cartridge from knime pretty regularly.

I normally use smiles as the input/output format and that seems to work fairly well. I suppose it's theoretically possible to support the rdkit binary format, but it would be more than a small amount of work.

What were you hoping to do with the binary format?



I'm relatively new to this, but we've started doing more QSAR type work, so I need to get up to speed:-) [A big change from analysing spectroscopic data and chemometrics data analysis that I'm familiar with].

I was hoping that we could store RDKit data that included 3D coordinates in postgresql so that we'd only have to calculate once using Corina software that generates sdf files (or Knime data with their plugin). We may well have a large number (100's k -> 10's M) of molecules in the db, so calculating 3D-coords on the fly might not be practical.

BTW, as far as I know, RDKit and CDKit 3D coordinates calculations do not seem that reliable (yet)?

It would be nice to be then able to query the postgres database using 3D & 2D descriptors prior to do more elaborate things in Knime.




There are two points here:

3D coordinate generation

Corina is a fine piece of software that is quite good at rapidly generating a single high-quality conformation for molecules. The RDKit's conformer generator is certainly not as fast and is really designed to be used to generate sets of conformations. It still does a pretty good job at finding conformations that are close to crystal-structure conformations. There was a paper on this published back in 2012:

Working with the database

Here it doesn't really matter which software you use to generate the coordinates. If you load molecules from mol or SDF blocks using the cartridge's mol_from_ctab() functionality, you end up with whatever structures you provide in the database (NOTE: there is a bug in the current release of the cartridge that causes this not to work; you need to use the github version or wait for the upcoming release).

At the moment there is no functionality in the cartridge for doing 3D descriptors, but that's definitely something that would be interesting to talk about. The rdkit-discuss mailing list would be a better forum for that.