Hi are there plans to support the newest MDL mol format v3000 with its enhanced stereo rules of OR and AND chiral centre flags?
this would help remove ambiguity around chirallity in MOL and SDF files.
this new format is gaining a lot of traction lately in defining molecules more fully.
The basic work of being able to read in v3000 mol files is already done.
There is not currently support for the enhanced stereo rules but I can definitely think about it. Can you point me to an explanation of the flags and some examples of these files appearing in the literature that I can look at?
The MOE nodes can read V3000 and the renderers also work with this format. But there is currently no support for the chiral flags. Basically this is because using the chiral flags you can describe a racemic mixture or in other words an entry can represent different enantiomers. this breaks many subsequent calculations. For example if you try to calculate the dipole moment the result is no longer well defined by an single V3000 entry. This could be circumvented somehow by enumerating all enantiomers that match the chrial restrictions. But this is best done at import time and not in the workflow later on.
The MOE nodes don't support writing V3000 currently as very few other vendor nodes can read V3000. But this (the writing) might change soon.
In general I think that it would be best to add a Sdf3000Cell type to knime first. currently you can create a lot of "fun" when reading a V3000 file into the SdfCells of a workflow.
I hope this overview from accelrys helps explain the meanings of these AND and OR flags.
This is a format we are actively using to help remove ambiguity over what centres are absolute chiral, mixtures or chiral but the exact R/S config unknown etc.
Has there been any progress on KNIME support for the V3000 mol format? I’ve noticed that none of the converters like “Molecule type cast” are able to handle this conversion.
On the molecule input side:
I believe the V3K mol and SDF files should work using normal cell types (so I don’t think we need to change the typecast node). The KNIME-supplied readers should work fine with v3k mol files or SDF files. Then it would be up to the individual chemistry toolkits to be able to parse V3K.
This works at least with the RDKit nodes, I’ve attached a workflow which demonstrates that:
V3K_Mol.knwf (7.2 KB)
Note that there’s not really any obvious way to see that the RDKit has parsed the V3K features (since the current RDKit renderer in KNIME doesn’t show the extended stereo information here), but hopefully you’ll take my word for it that the information is being read in.
Output is a different story. I’m not aware of a way to force output in V3K format. This is an option that we should add to the RDKit nodes, but it’s not currently there.
If you can share a bit more about exactly what you want to do I can either make suggestions or use the information as feedback for the next time we’re working on the RDKit KNIME nodes.