丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

PepArML: A Meta‐Search Peptide Identification Platform for Tandem Mass Spectra

互联网

1076
  • Abstract
  • Table of Contents
  • Materials
  • Figures
  • Literature Cited

Abstract

 

The PepArML meta?search peptide identification platform for tandem mass spectra provides a unified search interface to seven search engines; a robust cluster, grid, and cloud computing scheduler for large?scale searches; and an unsupervised, model?free, machine?learning?based result combiner, which selects the best peptide identification for each spectrum, estimates false?discovery rates, and outputs pepXML format identifications. The meta?search platform supports Mascot; Tandem with native, k?score and s?score scoring; OMSSA; MyriMatch; and InsPecT with MS?GF spectral probability scores?reformatting spectral data and constructing search configurations for each search engine on the fly. The combiner selects the best peptide identification for each spectrum based on search engine results and features that model enzymatic digestion, retention time, precursor isotope clusters, mass accuracy, and proteotypic peptide properties, requiring no prior knowledge of feature utility or weighting. The PepArML meta?search peptide identification platform often identifies two to three times more spectra than individual search engines at 10% FDR. Curr. Protoc. Bioinform . 44:13.23.1?13.23.23. © 2013 by John Wiley & Sons, Inc.

Keywords: proteomics; tandem mass spectra; machine learning; cloud computing

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Upload Tandem Mass Spectra
  • Alternate Protocol 1: Batch Upload of Many, Large, or Vendor‐Format Spectra Datafiles
  • Support Protocol 1: Registration and Login
  • Basic Protocol 2: Configure and Initiate the Search
  • Basic Protocol 3: Monitor and Manage the Search Jobs
  • Alternate Protocol 2: Run Search Jobs in the Cloud
  • Basic Protocol 4: Combine Search Results using PepArML Combiner
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Upload Tandem Mass Spectra

  Necessary Resources
  • A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. Users must register and log in to PepArML ( protocol 3Support Protocol ). To follow the example analysis, download the example spectra datafile 17mix‐test2.mzXML.gz (see Table 13.23.1 ).

Alternate Protocol 1: Batch Upload of Many, Large, or Vendor‐Format Spectra Datafiles

  Necessary Resources
  • The PepArML batch uploader must be downloaded (see Table 13.23.1 ) from the Edwards lab and installed. If vendor‐format conversion and peak‐picking/peak‐detection/centroiding using the ProteoWizard tools (Kessner et al., 2008) is required, then the uploader must be run on Windows computers and may require instrument vendor software to be installed. Users must register for PepArML ( protocol 3Support Protocol ). To follow the example analysis, download the example spectra datafile 17mix‐test2.mzXML.gz (see Table 13.23.1 ).

Support Protocol 1: Registration and Login

  Necessary Resources
  • A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. A valid e‐mail address is required for registration.

Basic Protocol 2: Configure and Initiate the Search

  Necessary Resources
  • A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. Users must register and login to PepArML ( protocol 3Support Protocol ). Spectra must already have been uploaded to the PepArML server ( protocol 1 or protocol 2 ).

Basic Protocol 3: Monitor and Manage the Search Jobs

  Necessary Resources
  • A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. Users must register and login to PepArML ( protocol 3Support Protocol ). Spectra must already have been uploaded to the PepArML server ( protocol 1 or protocol 2 ) and a peptide identification search configured and submitted ( protocol 4 ).

Alternate Protocol 2: Run Search Jobs in the Cloud

  Necessary Resources
  • A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. Users must register for PepArML ( protocol 3Support Protocol ). Spectra must already have been uploaded to the PepArML server ( protocol 1 or protocol 2 ) and a peptide identification analysis configured and submitted ( protocol 4 ). Users should have verified that the search jobs are being scheduled and are completing successfully ( protocol 5 ). Finally, users must have signed up for an EC2 capable account with Amazon Web Services at http://aws.amazon.com.

Basic Protocol 4: Combine Search Results using PepArML Combiner

  Necessary Resources
  • A modern Web browser, such as MS Internet Explorer, Mozilla Firefox, Google Chrome, or Apple Safari is required. Users must register and login to PepArML ( protocol 3Support Protocol ). Spectra must already have been uploaded to the PepArML server ( protocol 1 or protocol 2 ) and a peptide identification search configured and submitted ( protocol 4 ). Finally, search jobs must have completed and the corresponding result files populated ( protocol 5 and, optionally, protocol 7 as described here).
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure 13.23.1 PepArML homepage.
    View Image
  •   Figure 13.23.2 Uploading 17mix‐test2.mzxml.gz to the Tutorial folder of the spectra repository.
    View Image
  •   Figure 13.23.3 Completed upload of datafile 17mix‐test2.mzxml.gz to the Tutorial folder of the spectra repository.
    View Image
  •   Figure 13.23.4 Tutorial folder of spectra repository populated with spectra 17mix‐test2 and selection of Search from the popup menu.
    View Image
  •   Figure 13.23.5 Batch upload of 17mix‐test2.mzxml.gz to the Tutorial folder of the spectra repository.
    View Image
  •   Figure 13.23.6 Search parameters for the example analysis of 17mix‐test2.
    View Image
  •   Figure 13.23.7 Tutorial folder of results repository showing progress of the example analysis.
    View Image
  •   Figure 13.23.8 Example analysis search jobs running on the Edwards lab cluster (http://edwardslab.bmcb.georgetown.edu), Amazon Web Services (http://amazonaws.com), and Georgetown HPC computing resources (http://matrix.georgetown.edu).
    View Image
  •   Figure 13.23.9 Selection of PepArML Worker Amazon Machine Image for spot request.
    View Image
  •   Figure 13.23.10 Setting the Amazon spot request instance type and bid price.
    View Image
  •   Figure 13.23.11 PepArML username and password in the Amazon spot request User Data field.
    View Image
  •   Figure 13.23.12 Completed PepArML analysis for the Tutorial folder.
    View Image
  •   Figure 13.23.13 Evaluation of combiner methods by spectrum and peptide q ‐values (fdrcurves.png).
    View Image
  •   Figure 13.23.14 Information gain of PepArML PSM features for the example analysis (infogain.png).
    View Image
  •   Figure 13.23.15 Schema for unsupervised PepArML training heuristic (Edwards et al., , used with permission)
    View Image

Videos

Literature Cited

Literature Cited
   Breiman, L. 2001. Random forests. Mach. Learn. 45:5‐32.
   Craig, R. and Beavis, R.C. 2004. TANDEM: Matching proteins with tandem mass spectra. Bioinformatics 20:1466‐1467.
   Edwards, N., Wu, X., and Tseng, C.‐W., 2009. An unsupervised, Model‐Free, Machine‐Learning combiner for peptide identifications from tandem mass spectra. Clin. Proteomics 5 (1).
   Elias, J.E. and Gygi, S.P. 2007. Target‐decoy search strategy for increased confidence in large‐scale protein identifications by mass spectrometry. Nat. Methods 4:207‐214.
   Geer, L.Y., Markey, S.P., Kowalak, J.A., Wagner, L., Xu, M., Maynard, D.M., Yang, X., Shi, W., and Bryant, S.H. 2004. Open mass spectrometry search algorithm. J. Proteome Res. 3:958‐964.
   Keller, A., Nesvizhskii, A.I., Kolker, E., and Aebersold, R. 2002. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74:5383‐5392.
   Kessner, D., Chambers, M., Burke, R., Agus, D., and Mallick, P. 2008. ProteoWizard: Open source software for rapid proteomics tools development. Bioinformatics 24:2534‐2536.
   Kim, S., Gupta, N., and Pevzner, P.A. 2008. Spectral probabilities and generating functions of tandem mass spectra: A strike against decoy databases. J. Proteome Res. 7:3354‐3363.
   MacLean, B., Eng, J.K., Beavis, R.C., and McIntosh, M. 2006. General framework for developing and evaluating database scoring algorithms using the TANDEM search engine. Bioinformatics 22:2830‐2832.
   Mallick, P., Schirle, M., Chen, S.S., Flory, M.R., Lee, H., Martin, D., Ranish, J., Raught, B., Schmitt, R., Werner, T., Kuster, B., and Aebersold, R. 2006. Computational prediction of proteotypic peptides for quantitative proteomics. Nat. Biotechnol. 25:125‐131.
   Nesvizhskii, A.I., Keller, A., Kolker, E., and Aebersold, R. 2003. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75:4646‐4658.
   Peng, J., Elias, J.E., Thoreen, C.C., Licklider, L.J., amd Gygi, S.P. 2003. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC−MS/MS) for Large‐Scale protein analysis: The yeast proteome. J. Proteome Res. 2:43‐50.
   Perkins, D.N., Pappin, D.J., Creasy, D.M., and Cottrell, J.S. 1999. Probability‐based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551‐3567.
   Tabb, D.L., Fernando, C.G., and Chambers, M.C. 2007. MyriMatch: Highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome Res. 6:654‐661.
   Tanner, S., Shu, H., Frank, A., Wang, L.C., Zandi, E., Mumby, M., Pevzner, P.A., and Bafna, V. 2005. InsPecT: Identification of post translationally modified peptides from tandem mass spectra. Anal. Chem. 77:4626‐4639.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
 
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
扫码领资料
反馈
TOP
打开小程序