Rdkit from molecule failed to generate output to most of my entries

Hello, I have a list of sdf files for peptide molecules and I used the RDkit from molecule node.

I got this error:
WARN RDKit From Molecule 3:196 Failed to process data due to SDF Parsing Error (GenericRDKitException) - Generating empty result cells. [307 of 310 rows]

Among the 310 entries, only 3 were successful. I am not sure if I need to clean somehow my structures or I have to play with the settings of that node/
Thanks,

@greglandrum,

In not-that-long-ago-history, i believe that RDKit could not handled pseudo atoms in SDF; perhaps this is still the case?

If you’re not wedded to RDKit, you could also try using the ChemAxon nodes… or move over to using FASTA? (I’m sure Greg will have a more complete answer than this, though.)

1 Like

Hi @quaeler,
Do you suggest another format to fit better RDkit?
Is the desciptors node of chemaxxon free for academics? I cannot find it among node. I see only marvinsketch that requires licencse to calculate logP for example. I am not sure also if I can get all the desriptors in one go.
I see a node to get some descriptors from the sequence but not sure how to use fasta format to get mre information about the sequence.
Thanks,

I was going to champion FASTA, as i can see in the RDKit code base there is FASTA support - but now that i look at the nodes, it seems like perhaps there is no FASTA support in the KNIME RDKit nodes. MOE provides a FASTA reader, but then your structures are in MOE-land.
(… and sorry about the ChemAxon recommendation - i had seen that they were all shipping in the KNIME release application, so assumed they were free for use without license… you’d need to haggle with INFOCOM to see what their deal is concerning academia.)
Someone should make a swiss-army-knife structure convertor node (from/to :: RDKit, MOE, INFOCOM/ChemAxon, …) - barring that, i don’t see a MOE node that does *logP calculations, so i guess that puts you back in RDKit-land.

… which is a very long-winded way of saying, hopefully @greglandrum has a much better answer than i. Sorry!

Openbabel node can convert to fasta. Mabye try that. But in general rdkit is geared towards small molecules to keep in mind for possible additional hurdles.

And I suspect if you want an reply from greg, best to ask in the correct area, there is an RDKit subforum under community extensions.

Hi zizoo,

I would expect the RDKit to be able to read peptides from SDF without problems. Can you share an input file with some of the problematric structures?

-greg

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.