Support for XDF Chemistry Format

Hi,

There appears to be an increasing demand for support for XDF file formats for containing chemical structures (which are represented in Chime Strings) along with various data fields/columns too.

The XDF format is essentially similar to an SDF file format but more advanced in terms of being able to contain multiple structures per entry, and has very little restrictions on it.

It seems XDF support exists in other pipelining tools and would therefore be advantageous for this to be implemented into KNIME, as no doubt, this newer format will begin to take hold.

Thanks,

Simon

Hi Simon,

I haven't heard of XDF before (which doesn't mean much because I have only little knowledge about chemistry type standards ... although enough to know that SDF and SDF are often two different things :-) ). Can you give some pointers?

I have also briefly talked to Greg whether XDF is supported in the RDKit: No. Can the other (commercial and community) vendors comment on whether they support it (Indigo, CDK, CCG, ...)?

Thanks,
  Bernd

PS: Adding the type centrally to the KNIME chemistry type repository isn't a big deal as we only represent it as a typed string. It only makes sense to do so if there is more than one extension using it...

Hi Bernd,

The XDF format originally was from MDL, and more recently Accerlrys. They provide documentation for this fully published format to encourage free exchange of chemical structures and related information, and can be downloaded from their website at:

http://accelrys.com/products/informatics/cheminformatics/ctfile-formats/no-fee.php

XDF format is based on XML, and the XML schema for the XDF format is there. (I have no idea what I just written, means nothing to me, I am just copying what they say is in the document!) Hope it makes sense to you guys on knowing how to implement it, it would be greatly appreciated if it was possible to add it in to the central chemistry KNIME nodes.

What is wanted from this XDF format, is the ability to import XDF files, and to export XDF files too (i.e. two new nodes), like we can do now with the SDF reader and writer.

 

Thanks,

Simon.

Hi Simon, hi Bernd,

MOE doesn't read or write XDF files. I think XDF is unlikely to become a standard so it didn't make it on my to-do list so far.

But if a new molecule format arises that can be used as a standard for piping molecules so that there is no need for the user to be familiar with format limitations, I'm eager to support it.

Best regards

Guido 

How about CML? It's also an XML-based format and would be nice to have supported.

Thanks,

Natasja

Will the XML reader, writer and parser (e.g. XPATH) nodes not at least help with these formats if they are XML-based?  (I realise that this is not the simplest solution, but might be a pragmatic approach until dedicated readers are available)

Hi Natasja,

CML is included in the KNIME Base Chemistry Types extension. But is seems that not many chemistry packages use it. In my opinion every format has pros and cons. There is definetly a need for a common molecule format but I don't see it happen in the near future.

Cheers

Guido

I like the idea Steve brought up. If someone can share an example XDF file with me I can try to parse it with the standard XML processing nodes. A new XDF type representation does not seem to make much sense as none of the vendors supports XDF anyway. I guess someone has to write a (meta-)node to extract the relevant information from it (connection table, name, properties, ...whatever is in it)?

As for the other more general type discussion that Natasja and Guido brought up (again). This is discussed here: http://tech.knime.org/forum/knime-developers/introducing-a-knime-molecule-type (thanks to Greg we have a plan, we just need to put the API into stone and implement it). It's on the list for this year.

Hi,

       sir, I have downloaded some .XDF file from MDL database, the strcuture information in .XDF file is string format like this { CDATA[CYAAFQwA5cwQtdaVWaKTLUBL'''''''}

      How can I convert those format to some easy readable format, like .mol .sdf and so on, or there have some soft can do that?

 

Thanks,

yulan

Hi, Simon 

        I have some questions about XDF format convertion.

         I have downloaded some .XDF file from MDL database, the strcuture information in .XDF file is string format like this { CDATA[CYAAFQwA5cwQtdaVWaKTLUBL'''''''}

      How can I convert those format to some easy readable format, like .mol .sdf and so on, or there have some soft can do that?

 

Thanks a lot

yulan

Sorry, Despite my best efforts I don't believe xdf format was ever implemented in knime.

simon.

Hi Simon

         thanks for your reply.

yulan