KNIME & chemistry capabilities

Hi there,

I have been testing the chemistry capabilities of KNIME, using the CDK components, and have come across a number of issues:

1) CDK component failed to convert 24% of the parsed SMILES to CDK strings (2389 failed out of 9902 SMILES strings)

2) Molecular weight appears to be EXACT MOLECULAR MASS (Molecular properties node)

3) XlogP calculator values are too high to be realistic e.g. compound X = 24.731, when calculated using Accelrys Accord for Excel MlogP compound X=4.8

4)Lipinski node cannot be modified, so cannot adjust lipinski definition.

5) Speed. Processing of the SMILES to CDK (CDK extensions node) was very slow, taking around 15 minutes (KNIME running on a 2GHz dual core II processor 1GigRAM).

Best regards,

Stanage.

We are aware of quite a lot of problems with the CDK integration - a bug fix release coming out in the next few days will integrate CDK 1.0 and fix some of these. However, things won't be much faster than before...
Since we have no control over CDK, we can only try to integrate the tools that seem reasonably stable but can't do much about reliability, correctness, or performance :-(

We are not aware of a good, solid open source alternative, unfortunately.

The following contains commercial bias - read on at your own risk: ;-)
The tools that Tripos and Schrödinger will offer soon should stuff a number of these holes.

Hi Stanage,
I believe there are efforts to integrate Chemaxon tools into Knime.
I know they're reviewing this, they're ideally suited to the Knime architecture and technology from what I've seen of knime and ChemAxon.
We already integrated them into another Java framework using Elipse and it was very straightforward. This included SDF reading with fields and structure display and processing using the Jchem libraries.

I've found the chemaxon components to be robust and feature rich,

The fact that its a java API should make it easier to integrate.
I think its just a question of who integrates it and when its available, which of course comes down to time and I suppose money.

It would be really nice to see a few cheminformatics vendors sponsoring Knime components for their customers.
I'm sure MDL will have some java API's from ICENTRIS which could be integrated if you have access to them.
Not had any experience of the Tripos tools, I asked some questions early on the forum but got lonely ... I'm sure they're robust and should meet your needs.

Would be nice to see the CDK integration working as its under open source license.

Cheers Andrew

Hi there,

Stanage again. Just want to add my opinion to the Chemaxon functionality request.

Even though Chemaxon isn't opensource it is free to all schools and academics. It has a number of features that would be great within a KNIME node (chemical predictive properties being the first that comes to mind). The Chemaxon solution is already Java based, and the visualisation of structures is excellent, and can handle all the common chemical formats. I agree with Andrew that although there are other chemistry options (CDK, Tripos and Schrodinger), the inclusion of Chemaxon would be a good addition to the KNIME Chemistry node pantheon.

Is there anyone out there planning to create a KNIME node?, sadly my Java skills are not upto the job (and also not got enough time at the moment).

Best regards,

Stanage.

Hi All,

I actually compared some of the property calculators of ChemAxon and CDK and found that the CDK often are better. for instance the XLogP of ChemAxon is far worse then then the one from the CDK when compared to experimental values. Actually I found a thread on the chemaxon forum of their customers complaining about it. I must add that it is very important to use the right settings for theCDK Calculator (add explicite hydrogens for XLogP) . I did not test the KNIME implementation of it.
Another example:
H-bond donors. ChemAxon also counts the Halogen atoms as h-bond donors Of course there is published work which suggests this is correct but if you want to apply standard Lipinsky rules it not useful and courses a lot of compounds to be excluded.

Being a commercial party with limited resources I'm not allowed to work with ChemAxon, Tripos or Schrodinger software so I would prefer a good implementation of the CDK. For me the clustering part is very useful. Therefor descriptors like Burdens BCUT descriptors and the Kier-Hall Chi Descriptors available in the CDK would be very helpful. Will those be available through KNIME as well? Is there an example of how to create them myself?

Kind Regards,

Peter.

Peter,

peterem wrote:
Therefor descriptors like Burdens BCUT descriptors and the Kier-Hall Chi Descriptors available in the CDK would be very helpful. Will those be available through KNIME as well? Is there an example of how to create them myself?

We had long discussion with the CDK developers about which descriptors to include or not. We excluded some of them because they are not stable or may not produce correct results in all cases. Others were excluded, because they produce variable-length results or do not tell beforehand how long the result array is. Such descriptors would need a node of their own to handle these issues. Therefore we decided to include only the "stable" descriptors and those that can be handled automatically by a single node.
However, there is a way to enable at least the descriptors we excluded for reasons of stabilty, which requires some hand-work though:
  1. Find the file plugins/org.knime.ext.chem.cdk_1.2.1.BETA/knime-cdk.jar in your KNIME installation
  2. Open the file org/knime/ext/chem/cdk/molprops/molprops.set from the JAR either by unpacking the archive or using e.g. WinZip where it is possible to edit certain files directly without unpacking the whole archive (at least I guess so).
  3. Find the descriptor you want to include (only the Java class names are given) and remove the '#' in front of the corresponding line.
  4. Save the file and make sure it is included in the JAR-file in KNIME's plugin directory.
After that, the included descriptors should show up in the Node's dialog.
You should also have a look into KNIME's log file (in the runtime workspace under .metadata/knime) or set the logging level to debug and look for line like

MolPropsLibrary : Descriptor result ("") unkown, skipping descriptor Wiener Numbers

which means that these node return results (mostly arrays) that the MolProps node cannot handle, or

MolPropsNodeModel : Descriptor "org.openscience.cdk.qsar.descriptors.molecular.TaeAminoAcidDescriptor" not available

which means, that CDK does not "export" these descriptors (for whatever reason).
I will talk to Bernd as soon as he is back from vaction because he did most of the descriptor stuff.

Regards,

Thorsten

[/]

Hi Thorsten,

Thanks for your thorough reply. I will try your suggestion and let you know if I find some useful stuff. I presume I can also import an asc-ii file with these values since I do have some java applications generating some of these CDK descriptors.

Cheers,

Peter

peterem wrote:

Being a commercial party with limited resources I'm not allowed to work with ChemAxon, Tripos or Schrodinger software so I would prefer a good implementation of the CDK. For me the clustering part is very useful. Therefor descriptors like Burdens BCUT descriptors and the Kier-Hall Chi Descriptors available in the CDK would be very helpful. Will those be available through KNIME as well? Is there an example of how to create them myself?


Hello!

Sorry to jump on to this conversation a month late, but I just found out about KNIME yesterday. I've been using Chemaxon for about 4 or 5 years now and their java libraries for chemistry calulation are quite extensive. I wasn't aware of the ClogP issues, but the tools that I find useful from them are their IUPAC naming, maximum common substructure determination, SMARTS query cababilities, fingerprinting according to various schemes...amongst other nifty tools.

So one can't have a chemaxon plugin because KNIME is commercial? I'm not sure I understand.

How difficult is it to write a plugin? Would chemaxon have to do it?

Thank you for your help in advance

No, KNIME is free - but the open source license would require other plugins (as well as the code they use/call) to be free as well. We have a dual licensing model that allows commerical companies to keep their IP proprietary but talks with ChemAxon have faded away so I guess they lost interest in this model (and are obviously not interested in making all of their stuff freely accessible via KNIME either :-)

We did invest quite some time to integrate CDK but have limited resources to try out all of their tools, see how stable they are then integrate them into KNIME in a reliable way. External help here would be very much appreciated...

- Michael