Error to get RDkit from molecule

Hello,
I have a set of protein fragments (peptides in PDB format). When I use the node Rdkit from molecule, I get the error:
WARN RDKit From Molecule 0:264 Failed to process data due to SDF Parsing Error (GenericRDKitException) - Generating empty result cells. [All rows]

Is there a way to preprocess the input molecule to let the Rdkit node works properly.

NB: I already tried to convert the PDB as input (RDkit Canon smiles and MOE smiles from molecule), but both failed giving this error :
WARN RDKit Canon SMILES 0:266 Auto conversion in column ‘PDB File’ failed: SDF Parsing Error (GenericRDKitException) - Using empty cell. [All rows]
Encountered empty input cell. [All rows]
I am doing this work to be able to generate the RDkit descriptors for my peptides.

Thanks,

Hi,

the RDKit From Molecule currently works only with SDF, Smiles and Smarts, so applying on a PDB Cell would not work.
An alternative would be to convert the PDB column into SDF before.

Here is an example attached.:
rdkit_from_pdb.knwf (16.7 KB)

Hope it helps!

Jose Manuel

2 Likes

Hi @josemanuel
Your workflow is great. Thank you very much!
In the beginning, it did not work and I realised after that this is caused by the presence of small fragments in my structures entries. MolConverter provides the option to clean the structures by deleting the small fragments. And Eureka!
This may help other people that faced the same problem as me.

1 Like

Hi zizoo,

from what I understand, small fragments should not be an issue. But if I am not mistaken, SDF V2000 format (the usual one) accepts only up to 999 atoms and bonds. Maybe that is why the conversion might fail, for instance if you have many water molecules in the pdb.

I would not recommend the conversion to SDF V3000 format using molconvert though, as in my case it made KNIME crash…

2 Likes

Hi @josemanuel,
It worked for many structures. Now I am facing a new problem. As I cannot use SDF V2000 because my PDBs contain more that 999 atoms. And V3000 causes Knime crash. I used SDFile format but the output shows empty cells (molecule:) and Knime crashed and closed when the descriptors were calculated from empty cells.

Which format do you recommend to be able to apply RDkit descriptors?

Hi zizoo,

RDKit can actually parse PDB file contents from Python (and I assume from C++ as well).

So the “easiest” solution (if you can code a little bit) might be to have a local python running on your machine with the RDKit library. You could then convert the molecules using a Python Script node for directly parsing the PDB (this is currently not possible using the RDKit From Molecule node).

If this is not possible for you and if you don’t need the initial 3D coordinates of the peptides (but I assume you do!) , you could just convert the molecule to Smiles and then use the RDKit From Molecule node.

This might still fail because of structural errors in the PDB though (missing atoms, etc.). This behavior can be changed by messing with the RDKit From Molecule node configuraiton, but I would avoid to resort to this if possible.

There should also be possible to achieve something with the Java Snippet node, but I would have to check first if the corresponding functions have been exposed to Java…

Cheers,
Jose Manuel

1 Like

Hi zizoo,

Here is an example of a workflow using a Java Snippet node to convert Molecules directly from PDB:

rdkit_from_pdb_2.knwf (9.5 KB)

N.B. If the molecule is found invalid, then an empty cell is returned instead, which results in empty cells for computed descriptors (it should not make KNIME crash…)

Cheers,
Jose Manuel

1 Like

Thanks @josemanuel.
It is now more stable with many structures. But it crashed and Knime was killed when I used this entry (a fragment of a protein).
PF00012_1ATR_6_384_A_cropped (2).zip (59.7 KB)

Hi zizou,
that’s strange…I could not reproduce the problem. In fact, I could process your PDB file in my latest example worflow without any crash or issue (2938 heavy atoms).

I am running KNIME 4.0.1.

For what it’s worth, maybe you could try to update your KNIME version and RDKit nodes to latest release, if it is not already the case?

Cheers,
Jose Manuel

1 Like

Hi @josemanuel
I updated to the last version 4.02
When I restarted it, it opens a white window, freezes and does not show the menus.
Is there a way to repair it without the need to uninstall everything and start from scratch?
Thanks,

Hi again,
As I couldnt open the updated version of Knime. I installed a fresh copy 4.0.2 and I let knime to update itself according to the missing nodes in my workflow.
I have the version:
RDKit Binaries for Java 3.8.0.v201906261723 org.rdkit.knime.binaries.feature.feature.group NIBR
Knime still crashes when I use your workflow:
https://forum.knime.com/uploads/default/original/2X/6/6a53f8a063f6b4dc3640dd4190e93e2a577a62f5.knwf

I am not sure if I need to install RDkit in conda or an external environment to make it stable with Java or Knime.

Thanks,

@josemanuel

I also tried your workflow with a small molecule in sdf format attached here.
It did not work too.
Knime did not close this time but the output cell was empty with a question mark.

Thanks,jag_zh5-17-reagent-iPr-M0041_opt_B3LYP-D3_6-31Gss_01.zip (32.5 KB)

Hi zizoo,
sorry for the late reply.

You can check if the problem comes from the molecule or the java snippet by disabling the structural checks in the Java Snippet. I left the function signature as a comment) .

Try to change:
out_RDKit = RDKFuncs.PDBBlockToMol(c_PDBFiles, true, true, 0, true);
to
out_RDKit = RDKFuncs.PDBBlockToMol(c_PDBFiles, false, false, 0, true);

If it still does not work, then there is a problem with the installation. Otherwise, your PDB contains something that RDKit does not like (as mentioned in my earlier post).

Regarding your previous replies, I am very unsure what could be going wrong. I am just a fellow KNIME user and it worked right away for me when I downloaded the latest release. Maybe you could open a new thread so that the tech support could help you…

I hope it helps!

Cheers,
Jose Manuel

1 Like

Hi @josemanuel
I tried to change to False in the snippet but Knime crashed again.
It is interesting to know that it works for you. This discards the option that the pdb file has something wrong and it is probably from Knime of Rdkit installation and I have to follow the same step as you.
In my case, I downloaded the zip file and I installed Rdkit from Knime itself.
Did you use this procedure to run Knime?
DId you change something in Preferences-Java tab in Knime?
Did you install Java or Rdkit library separately for Java? (ie. I had similar problems with python library when it was solved when I installed a local environment for Knime in Python to use a specific library, maybe I have to do the same with Java and Rdkit)
It is interesting to learn more about using the snippet with Rdkit because I believe that there are much more options that are only available from scripting and not Rdkit nodes.
Thanks,

Hum,
I think we misunderstood each other, sorry about that.

The snippet I wrote is for parsing PDB file content only. I will not work with SDF files or Mol Block cells.

So basically, the first node of the workflow loads the PDB files content into a table, then the Java Snippet node parses the PDB Blocks into RDKit Molecules.

If you wish to parse SDF files instead, you should use the RDKit From Molecule node, which contains a SDF Parser.

Just to be clear, the RDKit From Molecule node currently contains 3 parsers: SDF, Smiles and Smarts.
RDKit itself can parse more formats, such as PDB, Mol2, etc.
So I wrote a Java Snippet to make use of the PDB Parser inside of KNIME.

I did not have the time to try with your SDF file, I was actually referring to your earlier PDB file:
PF00012_1ATR_6_384_A_cropped (2)

This file can be processed by the workflow just fine. In case there is a missing cell, this means the PDB Parser from RDKit found an error in your file, so no RDKit molecule could be created.

You can check if this is the case by switching off the structure checks in the Parser: inside of the Java Snippet:
out_RDKit = RDKFuncs.PDBBlockToMol(c_PDBFiles, false, false, 0, true);

Sorry again for the confusion.

Regarding the crashes, I find it strange. In my case, I only get errors in the console (running on linux).

Cheers,
Jose Manuel

1 Like

@josemanuel
Thanks Jose. Actually I am interested in the PDB format and not the SDF. The SDF was just for me to test that the snippet works in my version.
I am using the unzipped folder of Knime 4.02 on windows 10.
Did you install Rdkit separately for Java and then you call it in Knime (ie. I see that you import a Rdkit libray but I am not sure from where).
The snippet runs also successfully with other PDB (much bigger) but the one I sent to you is a tricky one apparently and makes Knime collapes.
Thanks,

Hi zizoo,
actually I just installed the RDKit nodes from KNIME. I never used KNIME on Windows, so I fear I cannot help you with this! :confused:

About how I got the Java Snippet node working, please have a look at this blog post:
https://www.knime.com/blog/using-custom-data-types-with-the-java-snippet-node

The example workflow is really useful!

Cheers,
Jose Manuel

1 Like

Hello @josemanuel,
Unfortunately, I couldn’t get any help for this problem.:pleading_face:
I see that RDkit is more common for python than Java.
I am wondering whether it is possible to create a python snippet or script to replace the java snippet that is crashing to do the same job (converting the pdb to RdkitMol) ?

Thanks,

Hi @zizoo,
Sorry for the late reply.

Yes, totally!

You can install python with RDKit and then use it inside of KNIME. If the PDB contains errors, you would get a empty cell and hopefully not a KNIME crash…

Jose Manuel

Hi @josemanuel
Actually, I tried the code in python but it crashed at the same point and the guy from Rdkit replied stating that it is a bug in the code and not related to Knime.
So, now I have to use another tool to extract the descriptors for my molecules.
I found that I can use PyDescriptor in Pymol but I am still working on that.
Thanks,