Buggy behavior of “Molecule to CDK” node

gcincilla · October 13, 2015, 1:02pm

Hi guys,

I'm experiencing a very odd behavior with the Molecule to CDK node. As you can see from the attached example workflow the same SMILES string is sometimes well parsed into CDK Cell format and sometimes it is not (“No 2D Coordinates”).

In the attached example workflow it seems that it depends on the molecule structure column name! Nevertheless I even experiences some well parsed → bad parsed and some bad parsed → well parsed cases also only resetting the upsstream nodes. It seems that there is something strange going on.

Does anybody can help with this?

I'm using CDK (nightly build) version: org.openscience.cdk.knime_1.5.3.201509071430.

Thanks in advance for any help,

Gio

bug_mol2cdk.zip

Stephan · October 18, 2015, 10:45pm

Hi Gio,

the behavior is unexpected but has nothing to do with the column name or table configuration. If you re-execute any of the three example branches in the workflow often enough, you will get the exception you described.

The issue has to do with the double bond stereochemistry. That's your smiles: "C(/N)=C\C=C\1/N=C1"

http://www.opensmiles.org/opensmiles.html (see cis/trans)

If you rewrite the overloaded stereochemistry information, the SMILES always passes the structure conversion.

Having said that, the library should either always fail or always manage to deal with it. When a new version of the relevant dependecy that deals with the conversion is released, I will update the library.

I hope that helps.

Stephan

gcincilla · October 19, 2015, 7:30pm

Hi Stephan,

Thank you for your always quick and helpful answers, now the problem is more clear. It would be great if you can update the library to always fail or always manage those cases. That would surely make it more coherent.

I thank you also for point me to the openSMARTS reference, I wasn't aware of this kind of issue with the cis/trans configuration. If I correctly understood, the problem is only due to the overloaded stereochemistry information. Is this meaning that the structure should be represented with something like "C(/N)=C\C=C(N=C1)\1"?

Anyway what is still strange to me is the fact that the SMILES under discussion was generated by CDK itself. I would expect CDK would be able to correctly parse a structure it generated by itself but probably this is not always a task as easy as it could seem. Can I ask you your opinion on this?

Thanks,

Gio

Stephan · October 19, 2015, 11:23pm

Hi Gio,

I'll have to wait for one of the underlying libraries to be updated first, but when that happens the problem will go away.

If you rewrite the SMILES like that, the structure will be read correctly. There is of course a number of different ways to encode an identical structure as SMILES.

Don't get me wrong, the inital SMILES you provided is perfectly valid! That's why CDK generated it in the first place :-). As one of CDK's maintainers put it to me, it's a deficiency in the algorithm that it cannot deal with cis/trans 'overload' because of the way it assigns the double bonds in its internal representation.

I hope that clarifies things.

Stephan

gcincilla · October 20, 2015, 9:40am

Stephan,

That clarifies things perfectly.

Thanks again for your kind answer and for taking care of this.

Best,

Gio

Stephan · November 28, 2015, 7:53pm

Hi Gio,

the latest update to the nightly should have also fixed the cis/trans 'overload' issue.

Big thanks to John May for this.

All best,

Stephan

gcincilla · December 2, 2015, 8:51am

Thank you very much Stephan, to you and to John May! You're great.

I will give it a try as soon as I update the nightly build.

Cheers,

Gio

gcincilla · December 4, 2015, 4:17pm

Hi Stephan,

Finally I tested this bug and now it seems it is solved in KNIME-CDK version 1.5.400.201511282037 running on KNIME 3.0.1.

Congrats, and thanks again!

Gio

system · April 21, 2023, 9:42pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.