Optimise Geometry Node has 2 issues: memory leak and missing values

First of I really like the RDKit nodes. So thank you.

Now to the issues:
The more row that were processed the higher the memory consumption will be until knime just crashes due to out of memory error. This limits the node to a couple thousands of molecules and in my case I can’t go beyond 1 GB of memory for knime.

The second issue is that the nodes creates empty values in cases the optimization failed. this then leads to errors later in the workflow. I would prefer that the node has 2 output ports, one with working molecules and one in which the optimization failed (while keeping the original molecule).

Instead of a second port, one could keep the original structure and add a boolean control column if the optimizing was successful or not.

Another solution would be to have an option to fall back to an UFF if the other force field MMFF94 failed and add a column which FF was used. UFF always works in my case so falling back to it seems reasonable.

All these suggestion could also be combined in some way. IMHO the current idiom with missing values is very annoying.

AFAIK the Aromatize node has exactly the same problem.

Hi,

Thanks for reporting the memory leak. I was able to reproduce it and we'll get it fixed (probably early next week).

I'd be curious to hear a few more opinions about the way failed optimizations due to missing parameters are handled. I'm not at all happy about the idea of passing the failed structures into the first port, but if the community prefers to have a "failed" port that includes structures where the force field could not be constructed, we can certainly make that change.

-greg

 

 

Hi Greg,

Can there be 3 output ports: 1 for the ones that worked, 1 for the ones that worked with an alternate FF, 1 for the ones that failed both attempts of generating a conformation?

Thanks,

Natasja

Surely the existing format of one outport would allow all possibilities.

I haven't encountered failed optimisations yet, but i would assume the empty values from failed optimisations can be easily handled with a missing value node afterwards to do whatever operation the user desires.

simon.

Hi,

there is an update available now in the nightly build (RDKit KNIME integration 2.3.0_201312090924) that should fix the out of memory issue for the RDKit Optimize Geometry Node. Please try it out and let me know if it works for you.

Thanks again for reporting it!

Kind regards,

Manuel

I would say Natasja suggestion is good too.

If a molecules fails the node with the selected force field there should be an easy way to try a different force field instead of just removing the molecules from the data set. Which force field was used can be stored in a separate column. Usually you only optimize 3D with already screened molecules so any one of them might be a “good one”. So removing them due to an incompatibility with a certain algorithm isn’t ideal, IMHO.

The issue is that failed optimizations might work with a different force-field. Instead of just removing the row you want to try it with a different force field.

Can’t there be an option to fall back on different FF, namely UFF because that seems to work with all molecules.

EDIT:

Try a molecule with Sn in it.

Have been using it often and it works for my current data set size (up to about 200K rows).

EDIT: On 32-bit windows with less than 1 GB of RAM for Knime.