How to: determine from drawn peptide structure the sequence (AAs and position)?

Anyone have any tips or solutions as how to determine a peptide sequence from a drawn structure? "Drawn" as in fully expanded C, H, N, O, S, Se atoms and corresponding bonds, not single/tripple letter code combinations.

I have been using smarts to determine existence of certain AAs, but that only results in number of occurences in the best case. I would like to have as a result single AAs ("fragments", resp. their single letter if natural amino acids). Potentially even determining and sorting after N-terminal to C-terminal if the input wasn't drawn in such a way.

Is that possible?

Hello Docminus,

I have no clue about cheminformatics, but your problem sounds a bit like network mining, maybe worth a shot?


Could you use the RDKit reaction nodes to effectively chop off 1 residue at a time from the C-terminus (or N-terminus), with a table of reaction smarts from your amino acid SMARTs?

See attached for an example (with a limited set of AA definitions!)


Sorry, I haven't checked this thread for a while.

Thank you! This is awesome!


1 Like