pdb file in the output of a python script

zizoo · July 11, 2019, 6:29pm

Hello,
I am editing a pdb file like the one zipped and attached here using a python script below.
# Copy input to output

X = input_table[‘PDB_ID’]
import Bio.PDB as bpdb
import pandas as pd
from pandas import DataFrame
import os

data_folder = “C:/Users/gu19888/Downloads/”
structure_id = data_folder + X[0]
filename = structure_id +".pdb"
s = bpdb.PDBParser().get_structure(structure_id,filename)

start_res=20
end_res=30
chain_id = ‘A’

class ResSelect(bpdb.Select):
def accept_residue(self, res):
if res.id[1] >= start_res and res.id[1] <= end_res and res.parent.id == chain_id:
return False
else:
return True

io = bpdb.PDBIO()
io.set_structure(s)
io.save(‘1y26_cropped.pdb’, ResSelect())
output_table = input_table[‘PDB_ID’].copy()

output_table = pd.DataFrame(output_table)

output_table.insert(1,‘1y26_cropped.pdb2’,io)

It is successfully loaded but the output give the following error when I run the python node.
Could you please suggest a solution?
Thanks,
ZIed5OF4.zip (61.9 KB)

quaeler · July 11, 2019, 7:18pm

(There is no ‘following error’ in your posting.)

zizoo · July 11, 2019, 7:39pm

Oups, sorry,
The error is:
ERROR Python Script (1⇒1) 2:49 Execute failed: No serializer extension having the id or processing python type “Bio.PDB.PDBIO.PDBIO” could be found.
Unsupported column type in column: “1y26_cropped.pdb2”, column type: “<class ‘Bio.PDB.PDBIO.PDBIO’>”.

quaeler · July 11, 2019, 7:51pm

It sounds like the Python executable being run from the node doesn’t have the same library path that is available to the Python executable with which you can successfully run that script outside of KNIME.

zizoo · July 12, 2019, 10:47am

I checked that the library biopython is well installed in my environment py36_knime and I do not see any error from python tab under preferences.
When I execute the script inside the node, it is successful but in the moment I try to pull out data from the node, it gives the error.
Is it possible that the dataframe of panda doesn’t like the PDB type and the type in dataframe should be kept string?
If it is the case, how I can convert the pdb type to string in the python node and once it is outside the node I convert it back to PDB type.
Thanks,

quaeler · July 12, 2019, 5:44pm

That there is no data type which can take that panda frame and bring it into KNIME sounds like a totally plausible theory. It actually looks like you’re putting the BioPython IO object into the dataframe, as opposed to the PDB object, so two thoughts:

use the PDBParser to get the cropped structure out of the file you wrote (1y26_cropped.pdb) and try putting that structure into the data frame to see if that makes it back into KNIME
if that fails, read the 1y26_cropped.pdb file as String content and put that string value into the data frame. If in subsequent nodes using Bio Python, you want to convert the string back to a PDB - it seems like PDBParser doesn’t handle streams, so you’d need to write the string content to a temp file and then point the PDBParser at the temp file.

zizoo · July 12, 2019, 5:54pm

Today, I tried to add these two lines:
str_io = str(io)
output_table.insert(1,‘1y26_cropped.pdb2’,str_io)

And the node can be executed successfully.
But now, I am wondering how I can convert it back to PDB format that I can use in the following nodes without the need to save the files to temp files and call them back with the Bioparser.

quaeler · July 12, 2019, 5:59pm

Do you mean with KNIME nodes which can handle structure types - (e.g Infocom’s ChemAxon wrappers)?

zizoo · July 12, 2019, 6:12pm

I just tried to use chemaxxon nodes as you suggested but it seems they require specific format and I cannot use the string type that I create for my pdb from the python node.
Are you aware of any other free wrapper/converter to get the PDB format again from the string type?

quaeler · July 12, 2019, 6:23pm

Let’s rotate that question slightly - once you have a PDB in a cell being output from a KNIME node, what KNIME nodes will you be using to perform operations on the structures?

(To read in a PDB file, you could use:

so perhaps your python node could just write out the cropped PDB to some location, output from the python node that location, and use this node to read the PDB file - if you’re going to use Vernalis nodes elsewhere to operate on the structure.)

zizoo · July 12, 2019, 6:34pm

Actually, I need the PDB format in a column because it is required in the following node “Protein alignment” from MOE.
I think that I don’t have another option apart from PDB loader that you suggested.
At least this will make me less worried about loss of data due to the conversion to string when I use Python node.
Thanks Quaeler.

quaeler · July 12, 2019, 6:39pm

MOE has its own Read PDB node:

Probably better to use that since you’ll be using other MOE nodes. Since it takes no input, you’ll need to trigger it to run after the python node by using the ‘drag a red flow variable port connection between the two nodes’ trick; and you’ll need to agree on a common file name - ie. the python node writes to “/tmp/foo.pdb” and the MOE node is configured to read from “/tmp/foo.pdb”.

zizoo · July 12, 2019, 6:47pm

This is great. It required a few tweaking but the problem is solved.
This shows some limitations in the python Panda dataframe node to deal with uncommon format like PDB.
I hope the developpers will take that into consideration in the next version.
Thanks quaeler for your help.

quaeler · July 12, 2019, 6:50pm

de rien!

system · January 11, 2020, 6:50am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.