RDKit Molecule handling and InChI

Hi,
I am trying to generate InChIs (Codes as well as Keys) from Molecules. Now using the RDKit Node comes in very handy. But the node actually says that it requires an RDKit Column, but works perfectly fine with an SDF. Is it internally converting the SDF to an RDKit Molecule?
If so: how are Molecules represented in RDKit? Is the 2d/3d information, present in the SDF, kept somehow or is it a graph representation and the coordinates are calculated by RDKit itself?
I am asking this, because, as far as I know, InChI can infer stereochemistry from 3D molecules even if not labeled correctly. So for me it would be important to know if I have to delete the 3D information before generating InChIs to be certain that all molecules are treated similar independent of their 2D/3D depiction in the original SDF.
I know this probably in not a two sentence answer so:
Is there any documentation on the mol class and how molecule representations are treated when read in to RDKit as molecule and converted back to the original format? Then I’ll gladly read that and come back with more specific questions.

Looking forward to your answers,

Jennifer

It used to be necessary to use an explicit convertor node to convert to RDKit molecules form SDF, MOL, SMILES etc. (similarly to convert to Indigo or CDK formats), but a while back changes were made which allowed toolkits to perform the conversion on the fly (Search the forum for Adaptor Cells if you want to know more detail!).

I’m not sure what settings are used in RDKit for the default conversion, but in the case of RDKit at least, you can explicitly do the conversion still using the RDKit from Molecule node (and at the end, if you wish, convert back with the RDKit to Molecule node). I would suspect that the node documentation simply hasn’t been updated to represent that change. (@greglandrum - it would be nice to have the options used exposed as either a preference page or as a node settings tab?)

That doesn’t answer the second part of your question, but hopefully covers the first part?

Steve

Thank you for the reply. Yes that definitely covers the first part. thanks for pointing me to the Adaptor cells, I once in a while had KNIME complaining about not being able to convert certain cell types, now I know where that came from :wink:
It would be great to know what is happening in the node, agreed. Especially as cheminformatics still suffers from a lot of issues when it comes to general structure handling and having unseen internal conversions just makes life more complicated.

Jennifer

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.