Windows version crashes while Linux version works normally with javasnippet

Hello,
I used a javasnippet node with a an Rdkit script.
One of the users here @josemanuel managed to make it work with Linux version (4.02) but in my version on windows 10 it is not working. Once the snippet is executing, Knime crashes and closes before I can read any error.
The snippet worked for me normally with other PDB file but crashes with the one attached here.
My analysis of the problem: it worked with other pdb then the script itself and RDkit should be alright.
The pdb did not work for me but worked for a linux user, so I guess it is a problem in the windows version.
I attach the workflow and the pdb causing the problem.
Thanks,rdkit_from_pdb_2.zip (68.8 KB)

I am able to reproduce this on my windows machine. I will see if I can figure out what’s going on.

2 Likes

Hi @greglandrum,
I also rewrote the code in Python and this time I get a different error.
I put below the script that I used for the puthon script node and the error just underneath.

from rdkit import Chem
from rdkit import *
import pandas as pd
c1_PDBFiles = input_table[ā€˜PDB Files’]
c_PDBFiles = c1_PDBFiles.to_string()
#fileName: name of the file to read
#sanitize: (optional) toggles sanitization of the molecule. Defaults to true.
#removeHs: (optional) toggles removing hydrogens from the molecule. This only make sense when sanitization is done. Defaults to true.
#flavor: (optional)
#proximityBonding: (optional) toggles automatic proximity bonding
out_RDKit = Chem.rdmolfiles.MolFromPDBFile(c_PDBFiles, True, True, 0, True);

Bad input file PF00012_1BUP_6_381_A_cropped_-preprocessed.pdb REMARK 4 COMPLIES WITH FORMAT V. 3.0, 1…
Traceback (most recent call last):
File ā€œā€, line 17, in
OSError: Bad input file PF00012_1BUP_6_381_A_cropped
-_preprocessed.pdb REMARK 4 COMPLIES WITH FORMAT V. 3.0, 1…

PF00012_1ATR_6_384_A_cropped.zip (59.7 KB)

I’ve managed to at least narrow the problem down a bit more. The problem is actually happening when the RDKit tries to generate SMILES for the molecule (which happens with every RDKit molecule cell) and it seems to be related to the size of the protein.
I am going to continue to investigate, but this may not be something that’s immediately fixable.

@greglandrum,
I managed to execute internally the python script successfully. But Knime still crashes when I execute it in the workflow (the problem can also be related to save the output in the smile format as you mentioned).
I put below the corrected script:

output_table = input_table.copy()
from rdkit import Chem
from rdkit import *
import pandas as pd

#fileName: name of the file to read
#sanitize: (optional) toggles sanitization of the molecule. Defaults to true.
#removeHs: (optional) toggles removing hydrogens from the molecule. This only make sense when sanitization is done. Defaults to true.
#flavor: (optional)
#proximityBonding: (optional) toggles automatic proximity bonding

filename = list(dict(input_table[ā€˜Location’]).values())[0].replace(ā€œfile:/ā€,"")

print(filename)

out_RDKit = Chem.rdmolfiles.MolFromPDBFile(filename, True, True, 0, True);
output_table[ā€˜out_RDKit’] = out_RDKit
print(output_table[ā€˜out_RDKit’])

yes, it looks like you will have the same problem no matter how you try and construct a molecule cell using this molecule
There’s some kind of tricky problem in the core of the code (it’s not connected to KNIME in any way) that only happens on Windows.
I’m not sure that I’m going to be able to fix this, but I will put a bit more time into it.

@greglandrum,
I need the RdkitMol format to calculate the molecular descriptors.
Is there another format that I can use instead of the RdkitMol.
Otherwise, I see that I can calculate the descriptors inside the python script directly without the need to save that format causing the problem?

@greglandrum,

When you say it is a tricky problem, do you mean it is a bug or a limitation in the number of residues in a peptide chain. I noticed that Knime crashed more often with quite large molecules and worked with the others.
If I understand the nature of this Rdkit problem, I can filter my proteins in a way that discards the entries that cause the problem.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.