Creating a target-decoy database for peptide identification

mapopova · March 5, 2019, 1:00pm

Hi, I’m trying to create a target-decoy database for peptide identification. I have downloaded a fasta file with chicken proteome from UNIPROT.
As I understand I need to cut all proteins in the proteome to peptides and generate a database with all those peptides and add a decoy to each of them.
I applied DecoyDatabase tool from OpenMS, but it only generates a database of proteins not peptides… Anyway I tried it and It worked well for XTandemAdapter but didn’t work for OMSSA. OMSSA finished successfully, but it’s output was empty.

jpfeuffer · March 5, 2019, 3:25pm

Hi!

You are correct about the steps that need to be done, however the search engine should do the
theoretical digest for you, based on the enzyme settings you give it. So a protein-only database is
totally correct (I think by default DecoyDatabase takes every peptide, reverses it and puts them back
together to a protein because the search engines expect that).

That said, there must be something else wrong with OMSSA. Maybe we find out more if you post the standard output and error log from OMSSAAdapter (rightclick the node and select it from the context menu).

Cheers
Julianus

mapopova · March 5, 2019, 4:05pm

Thank you for your response!
I will post the output as soon as I get access to the computer.
Right now I can only say that there were lots of decoys in the output and all of the resulted proteins have the same score -15.
And I forgot to say that Comet doesn’t identify proteins correctly too.

I also would like to clarify whether it is normal that engines find protein hits? I thought peptide hits should be there. Did I misunderstand something?

Thanks again!
Maria

jpfeuffer · March 5, 2019, 4:33pm

True, some search engines report just peptide identifications but most of them also print the proteins
for which at least one peptide was found. Since however different search engines perform the mapping
differently and some don’t do it all, you usually index the peptides with PeptideIndexer afterwards.
So protein IDs right after a peptide search engine are relatively meaningless (including their scores,
which is often just the maximum of the peptide scores or even the same one for every protein). Most search engines do not
even have a notion about targets and decoys and even should not care about this information to be unbiased.

Cheers
Julianus

mapopova · March 6, 2019, 2:47am

But if an engine gives me the list of proteins do I have to apply other tools like FidoAdapter ?

jpfeuffer · March 6, 2019, 10:30am

Yes, you should almost always overwrite the mappings with PeptideIndexer first. Then, if you don’t want to do rule-based identification, you should convert the Peptide scores into probabilities by Percolator or IDPosteriorErrorProbability to run an inference on them with FidoAdapter.

Did you have a look at our tutorials and example workflows. You can skip quantification if you are not
interested in that:

https://github.com/OpenMS/Tutorials/blob/master/Workflows/labelfree_with_protein_quantification.knwf

system · April 21, 2023, 9:45pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.