My KNIME workflow processes a set of chemical structures and generates 3D coordinates.
When I output the results to an Oracle table (using the Database Writer), the resulting data for the chemical structure is in SMILES format -- a complete loss of coordinates!
Within KNIME, the 3D coordinate column appears to have a structure with coordinates, but all the output methods I've tried (Database Writer, XLS Writer, and just copy/pasting from the data table within the KNIME GUI), yield SMILES strings.
How can I get the coordinates of the 3D structures out of KNIME, hopefully as molfiles?
After using the RDKit Coordinate Generator, the type is RDKit Molecule. When the CDK 3D Coordinates is used , the output is CDK Molecule. Both end up as SMILES by default.
The RDKit Molecule can be converted back to a molfile... except the RDKit Coordinate Generator keeps crashing on our data.
Ah, this might be an adapter cell problem. The writer nodes can't handle the RDKit or CDK cell types and it is getting the original input which was probably SMILES?
If you do an operation using a CDK or RDKit node you will need to use the converter to create a Molfile / SDF cell to then write out with coordinates.
Scrap that, both nodes work for me and save an SDF whith coordinates. Which version of KNIME are you using?
You can use the CDK to Molecule or the RDKit To Molecule to create an SDF column. Both of these still have the coordinates.
I'm not sure what you mean by the default is SMILES? Try doing explicit conversion to SDF prior to your write operations and see if that helps. It might be worth reporting your issue with the RDKit node in the RDKit section of the forum.
Sam has provided the answer that I would suggest: if you add an RDKit To Molecule node and tell it to convert the RDKit column into a SDF column. That should end up being properly written to the database.
Connecting an SDF writer to the 3D Coordinates node does lead to 3D structures being captured in the ouput.
However, it does require some extra steps when the real aim here is to get the molfiles into a database table.
I'll follow up with the RDKit team.
What I mean by default is SMILES is that copying data from a molecule cell in a KNIME table and pasting into another program, such as Notepad yields a SMILES string, even in cases like this where the coordinates are part of the information.
Ah well that isn't the same as writing out the contents of the cell. The contents you get in the clipboard I believe is the String returned by the cell's toString() implementation which can be whatever the developer implemented and is just a String representation of the object. It would appear RDKit and CDK cells return SMILES.
If you do a RDKit/CDK to Mol format you should be able to use the database writer node to write with coordinates. The SDF writer in my example is only to illustrate the coordates remain.
Once you have converted to a Mol/SDF cell if you copy that cell and paste into notepad you should see the full CTAB representation.
I would guess the database writer you used is taking the toString() from the cell as it doesn't have specific handling of these chemical types. So using a standard interchange format type cell (MolCell / SdfCell) should work.
Doesn't adding an RDKit To Molecule node and selecting "SDF" as the output type work?
Thanks, Greg, Sam,
Yup; the suggestion worked. The RDKit to molecule conversion yielded molfiles in the CLOB field of my database table.
Greg: since you're 'listening,' I found that when I ran large molecule (more than about 900 atoms) through the RDKit Generate Coords, KNIME crashed. Would you like me to post some of these molecules?
It's fine not to be able to generate 3D coordinates for large molecules but a more graceful exit would be nice.
If you could send me a couple of example molecules I'd appreciate it. I can take a look at those and see if those are an RDKit problem or not.
Here is a single-record SD file that will crash KNIME when you try to calculate 3D coordinates.
My request is to have an error message as KNIME keeps running.