PepArML: A Meta‐Search Peptide Identification Platform for Tandem Mass Spectra

互联网2013-12-31

1094

Abstract
Table of Contents
Materials
Figures
Literature Cited

Abstract

The PepArML meta?search peptide identification platform for tandem mass spectra provides a unified search interface to seven search engines; a robust cluster, grid, and cloud computing scheduler for large?scale searches; and an unsupervised, model?free, machine?learning?based result combiner, which selects the best peptide identification for each spectrum, estimates false?discovery rates, and outputs pepXML format identifications. The meta?search platform supports Mascot; Tandem with native, k?score and s?score scoring; OMSSA; MyriMatch; and InsPecT with MS?GF spectral probability scores?reformatting spectral data and constructing search configurations for each search engine on the fly. The combiner selects the best peptide identification for each spectrum based on search engine results and features that model enzymatic digestion, retention time, precursor isotope clusters, mass accuracy, and proteotypic peptide properties, requiring no prior knowledge of feature utility or weighting. The PepArML meta?search peptide identification platform often identifies two to three times more spectra than individual search engines at 10% FDR. Curr. Protoc. Bioinform . 44:13.23.1?13.23.23. © 2013 by John Wiley & Sons, Inc.

Keywords: proteomics; tandem mass spectra; machine learning; cloud computing

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Introduction
Basic Protocol 1: Upload Tandem Mass Spectra
Alternate Protocol 1: Batch Upload of Many, Large, or Vendor‐Format Spectra Datafiles
Support Protocol 1: Registration and Login
Basic Protocol 2: Configure and Initiate the Search
Basic Protocol 3: Monitor and Manage the Search Jobs
Alternate Protocol 2: Run Search Jobs in the Cloud
Basic Protocol 4: Combine Search Results using PepArML Combiner
Guidelines for Understanding Results
Commentary
Literature Cited
Figures
Tables

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Upload Tandem Mass Spectra

Necessary Resources

A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. Users must register and log in to PepArML ( protocol 3Support Protocol ). To follow the example analysis, download the example spectra datafile 17mix‐test2.mzXML.gz (see Table 13.23.1 ).

Alternate Protocol 1: Batch Upload of Many, Large, or Vendor‐Format Spectra Datafiles

Necessary Resources

The PepArML batch uploader must be downloaded (see Table 13.23.1 ) from the Edwards lab and installed. If vendor‐format conversion and peak‐picking/peak‐detection/centroiding using the ProteoWizard tools (Kessner et al., 2008) is required, then the uploader must be run on Windows computers and may require instrument vendor software to be installed. Users must register for PepArML ( protocol 3Support Protocol ). To follow the example analysis, download the example spectra datafile 17mix‐test2.mzXML.gz (see Table 13.23.1 ).

Support Protocol 1: Registration and Login

Necessary Resources

A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. A valid e‐mail address is required for registration.

Basic Protocol 2: Configure and Initiate the Search

Necessary Resources

A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. Users must register and login to PepArML ( protocol 3Support Protocol ). Spectra must already have been uploaded to the PepArML server ( protocol 1 or protocol 2 ).

Basic Protocol 3: Monitor and Manage the Search Jobs

Necessary Resources

A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. Users must register and login to PepArML ( protocol 3Support Protocol ). Spectra must already have been uploaded to the PepArML server ( protocol 1 or protocol 2 ) and a peptide identification search configured and submitted ( protocol 4 ).

Alternate Protocol 2: Run Search Jobs in the Cloud

Necessary Resources

A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. Users must register for PepArML ( protocol 3Support Protocol ). Spectra must already have been uploaded to the PepArML server ( protocol 1 or protocol 2 ) and a peptide identification analysis configured and submitted ( protocol 4 ). Users should have verified that the search jobs are being scheduled and are completing successfully ( protocol 5 ). Finally, users must have signed up for an EC2 capable account with Amazon Web Services at http://aws.amazon.com.

Basic Protocol 4: Combine Search Results using PepArML Combiner

Necessary Resources

A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. Users must register and login to PepArML ( protocol 3Support Protocol ). Spectra must already have been uploaded to the PepArML server ( protocol 1 or protocol 2 ) and a peptide identification search configured and submitted ( protocol 4 ). Finally, search jobs must have completed and the corresponding result files populated ( protocol 5 and, optionally, protocol 7 as described here).

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 13.23.1 PepArML homepage.

View Image

Figure 13.23.2 Uploading 17mix‐test2.mzxml.gz to the Tutorial folder of the spectra repository.

View Image

Figure 13.23.3 Completed upload of datafile 17mix‐test2.mzxml.gz to the Tutorial folder of the spectra repository.

View Image

Figure 13.23.4 Tutorial folder of spectra repository populated with spectra 17mix‐test2 and selection of Search from the popup menu.

View Image

Figure 13.23.5 Batch upload of 17mix‐test2.mzxml.gz to the Tutorial folder of the spectra repository.

View Image

Figure 13.23.6 Search parameters for the example analysis of 17mix‐test2.

View Image
Figure 13.23.7 Tutorial folder of results repository showing progress of the example analysis.

View Image

Figure 13.23.8 Example analysis search jobs running on the Edwards lab cluster (http://edwardslab.bmcb.georgetown.edu), Amazon Web Services (http://amazonaws.com), and Georgetown HPC computing resources (http://matrix.georgetown.edu).

View Image

Figure 13.23.9 Selection of PepArML Worker Amazon Machine Image for spot request.

View Image
Figure 13.23.10 Setting the Amazon spot request instance type and bid price.

View Image
Figure 13.23.11 PepArML username and password in the Amazon spot request User Data field.

View Image
Figure 13.23.12 Completed PepArML analysis for the Tutorial folder.

View Image

Figure 13.23.13 Evaluation of combiner methods by spectrum and peptide q ‐values (fdrcurves.png).

View Image

Figure 13.23.14 Information gain of PepArML PSM features for the example analysis (infogain.png).

View Image

Figure 13.23.15 Schema for unsupervised PepArML training heuristic (Edwards et al., , used with permission)

View Image

Videos

Literature Cited

Literature Cited
	Breiman, L. 2001. Random forests. Mach. Learn. 45:5‐32.
	Craig, R. and Beavis, R.C. 2004. TANDEM: Matching proteins with tandem mass spectra. Bioinformatics 20:1466‐1467.
	Edwards, N., Wu, X., and Tseng, C.‐W., 2009. An unsupervised, Model‐Free, Machine‐Learning combiner for peptide identifications from tandem mass spectra. Clin. Proteomics 5 (1).
	Elias, J.E. and Gygi, S.P. 2007. Target‐decoy search strategy for increased confidence in large‐scale protein identifications by mass spectrometry. Nat. Methods 4:207‐214.
	Geer, L.Y., Markey, S.P., Kowalak, J.A., Wagner, L., Xu, M., Maynard, D.M., Yang, X., Shi, W., and Bryant, S.H. 2004. Open mass spectrometry search algorithm. J. Proteome Res. 3:958‐964.
	Keller, A., Nesvizhskii, A.I., Kolker, E., and Aebersold, R. 2002. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74:5383‐5392.
	Kessner, D., Chambers, M., Burke, R., Agus, D., and Mallick, P. 2008. ProteoWizard: Open source software for rapid proteomics tools development. Bioinformatics 24:2534‐2536.
	Kim, S., Gupta, N., and Pevzner, P.A. 2008. Spectral probabilities and generating functions of tandem mass spectra: A strike against decoy databases. J. Proteome Res. 7:3354‐3363.
	MacLean, B., Eng, J.K., Beavis, R.C., and McIntosh, M. 2006. General framework for developing and evaluating database scoring algorithms using the TANDEM search engine. Bioinformatics 22:2830‐2832.
	Mallick, P., Schirle, M., Chen, S.S., Flory, M.R., Lee, H., Martin, D., Ranish, J., Raught, B., Schmitt, R., Werner, T., Kuster, B., and Aebersold, R. 2006. Computational prediction of proteotypic peptides for quantitative proteomics. Nat. Biotechnol. 25:125‐131.
	Nesvizhskii, A.I., Keller, A., Kolker, E., and Aebersold, R. 2003. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75:4646‐4658.
	Peng, J., Elias, J.E., Thoreen, C.C., Licklider, L.J., amd Gygi, S.P. 2003. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC−MS/MS) for Large‐Scale protein analysis: The yeast proteome. J. Proteome Res. 2:43‐50.
	Perkins, D.N., Pappin, D.J., Creasy, D.M., and Cottrell, J.S. 1999. Probability‐based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551‐3567.
	Tabb, D.L., Fernando, C.G., and Chambers, M.C. 2007. MyriMatch: Highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome Res. 6:654‐661.
	Tanner, S., Shu, H., Frank, A., Wang, L.C., Zandi, E., Mumby, M., Pevzner, P.A., and Bafna, V. 2005. InsPecT: Identification of post translationally modified peptides from tandem mass spectra. Anal. Chem. 77:4626‐4639.

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

PepArML: A Meta‐Search Peptide Identification Platform for Tandem Mass Spectra

Abstract

Table of Contents

Materials

Basic Protocol 1: Upload Tandem Mass Spectra

Alternate Protocol 1: Batch Upload of Many, Large, or Vendor‐Format Spectra Datafiles

Support Protocol 1: Registration and Login

Basic Protocol 2: Configure and Initiate the Search

Basic Protocol 3: Monitor and Manage the Search Jobs

Alternate Protocol 2: Run Search Jobs in the Cloud

Basic Protocol 4: Combine Search Results using PepArML Combiner

Figures

Videos

Literature Cited