beninner's question: cannot pipe openbabel's output to '


in my simple workflow I read a single PDB-file with 3d-data for a ligand, than convert this using the openbabel node to a canonical smiles (btw: I have to use openbabel, since RDKit itself does NOT properly read the CONECT records from PDB and therefore generates WRONG smiles, at least for my example).

Trying to pipe the output of openbabel node into "RDKit From molecule", I receive message 'No column in spec compatible to "SmilesValue" "SmartsValue" or "SdfValue'.

With the help of a CSV-Writer, I can see that the OpenBabel node per default produces 3 columns: 1st & 2nd just naming File & URL, and 3rd column "PDB Files", containing the correct SMILES, but NO column "SmilesValue", needed for the RDKit node !

How exactly should I configure the OpenBabel output for proper input to "RDKit From Molecule" ?


I'm guessing that you need to add a molecule type cast node and convert the column output by OpenBabel into a SMILES column.


p.s. As you've noticed, the RDKit PDB reader does not attempt to guess bond orders. OpenBabel is a good choice for this.

Thank you for your quick reply.

Couldn't make it work yet: inserting a 'Molecule Type Cast' after the OpenBabel node, which I configured: "Structure Column" = PDB Files & Structure Column : Smiles. The following 'RDKit from Molecule' node NOW allows to be configured, as: Molecule Column = PDB Files (preset) & the "New Column Name" (?) : PDB Files (preset).

Anyway, execution yields console error message/warning "Failed to process data due to SMILES Parsing Error".

My "diagnostic" CSV-Writer (see screenshot attached) yields the same output, with or without the Molecule Type Cast nodce inserted:

"Location","URL","PDB Files"
"file:/home/frank/Software/RDKit/Scripts/natProduct/heb_conf2_optimize.pdb","file:/home/frank/file:/home/frank/Software/RDKit/Scripts/natProduct/heb_conf2_optimize.pdb","Oc1ccc(cc1)[C@H]1[C@H]2c3cc(O)cc4c3[C@@H](C3=CC(=O)[C@H]5C(=O)[C@@]13[C@H](c1c2c(O)cc(c1)O)[C@H]5c1ccc(cc1)O)[C@@H](O4)c1ccc(cc1)O ../in/mol1.pdb"

BTW: In the "PDB Files" column, the additional "../in/mol1.pdb" after the SMILES string, maybe, causes the problems ?

Additionally: don't know how make the 'OpenBabel' node to output the bare SMILES, i.e. without the disturbing "../in/mol1.pdb", at the end ! Additional write options ("optional parameters", e.g. -n) do NOT work, as documented.

You are correct. That extra text, which is not SMILES, would cause the RDKit parser (which is relatively strict) to fail on the molecule. There may be a way to change the output of the OpenBabel node, but I'm just not that familiar with it. The easiest way I know of to remove the extra text is to use the "Cell Splitter" node to split the column into two pieces, the first with the SMILES and the second with the filename. You would put this before the Molecule Type Cast node and then use the Molecule Type Cast node to convert the SMILES column to a SMILES type.