丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

Using OrthoMCL to Assign Proteins to OrthoMCL‐DB Groups or to Cluster Proteomes Into New Ortholog Groups

互联网

984
  • Abstract
  • Table of Contents
  • Figures
  • Literature Cited

Abstract

 

OrthoMCL is an algorithm for grouping proteins into ortholog groups based on their sequence similarity. OrthoMCL?DB is a public database that allows users to browse and view ortholog groups that were pre?computed using the OrthoMCL algorithm. Version 4 of this database contained 116,536 ortholog groups clustered from 1,270,853 proteins obtained from 88 eukaryotic genomes, 16 archaean genomes, and 34 bacterial genomes. Future versions of OrthoMCL?DB will include more proteomes as more genomes are sequenced. Here, we describe how you can group your proteins of interest into ortholog clusters using two different means provided by the OrthoMCL system. The OrthoMCL?DB Web site has a tool for uploading and grouping a set of protein sequences, typically representing a proteome. This method maps the uploaded proteins to existing groups in OrthoMCL?DB. Alternatively, if you have proteins from a set of genomes that need to be grouped, you can download, install, and run the stand?alone OrthoMCL software. Curr. Protoc. Bioinform. 35:6.12.1?6.12.19. © 2011 by John Wiley & Sons, Inc.

Keywords: OrthoMCL; ortholog groups; paralog; proteome; Markov clustering; reciprocal best hits; MCL

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Strategic Planning
  • Basic Protocol 1: Assign a Proteome to OrthoMCL‐DB Groups
  • Basic Protocol 2: Create Ortholog Groups from Your Proteomes Using the OrthoMCL Software
  • Support Protocol 1: Downloading, Installing, and Configuring the OrthoMCL Programs
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure 6.12.1 Overview of the OrthoMCL algorithm. (1) Proteomes must each be in FASTA format where the file name and definition lines comply with simple requirements. (2) The proteome files are filtered to remove low‐quality sequences based on length and percent stop codons. (3) The proteomes are all compared to each other using BLASTP. They are masked with seg and an e‐value cutoff of 1e‐5 is applied. (4) For each pair of sequences that match, compute the “percent match length” score: count the number of amino acids in the shorter sequence that participate in any HSP, divide that by the length of the shorter sequence, and multiply by 100. Filter away matches with percent match < 50%. (5) For all pairs of proteomes, find all pairs of proteins across them that have hits as good as or better than any other hits between these proteins and other proteins in those species. (6) Find all pairs of proteins within a species that have mutual e‐values that are better than or equal to all of those proteins' hits to proteins in other species. (7) Find all pairs of proteins across two species that are connected through orthology and in‐parology. (8) Normalize in‐paralog e‐values by averaging all qualifying in‐paralog pairs in a genome and divide each pair by the average. Within a genome, in‐paralog pairs qualify if either of the proteins in the pair has an ortholog in any genome. If no in‐paralogs within a genome have any orthologs, all in‐paralogs in that genome qualify. Normalize ortholog and co‐ortholog pairs for any two species by averaging the e‐values across them, and normalize using that average. (9) Pass on all ortholog, in‐paralog, and co‐ortholog pairs, with their normalized e‐values, to the MCL program for clustering.
    View Image
  •   Figure 6.12.2 OrthoMCL‐DB home page with the Tools link circled.
    View Image
  •   Figure 6.12.3 A proteome mapped to OrthoMCL‐DB. The results are downloaded as a .zip file that contains five files. Shown here is the orthologGroups file obtained after submitting the Erwinia carotovora proteome (Bell et al., ).
    View Image

Videos

Literature Cited

   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths‐Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., Studholme, D.J., Yeats, C., and Eddy, S.R. 2004. The Pfam protein families database. Nucleic Acids Res. 32:D138‐D141.
   Bell, K.S., Sebaihia, M., Pritchard, L., Holden, M.T., Hyman, L.J., Holeva, M.C., Thomson, N.R., Bentley, S.D., Churcher, L.J., Mungall, K., Atkin, R., Bason, N., Brooks, K., Chillingworth, T., Clark, K., Doggett, J., Fraser, A., Hance, Z., Hauser, H., Jagels, K., Moule, S., Norbertczak, H., Ormond, D., Price, C., Quail, M.A., Sanders, M., Walker, D., Whitehead, S., Salmond, G.P., Birch, P.R., Parkhill, J., and Toth, I.K. 2004. Genome sequence of the enterobacterial phytopathogen Erwinia carotovora subsp. atroseptica and characterization of virulence factors. Proc. Natl. Acad. Sci. U.S.A. 101:11105‐11110.
   Chen, F., Mackey, A.J., Stoeckert, C.J. Jr., and Roos, D.S. 2006. OrthoMCL‐DB: Querying a comprehensive multi‐species collection of ortholog groups. Nucleic Acids Res. 34:D363‐D368.
   Chen, F., Mackey, A.J., Vermunt, J.K., and Roos, D.S. 2007. Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS One. 2:e383.
   Enright, A.J., Van Dongen, S., and Ouzounis, C.A. 2002. An efficient algorithm for large‐scale detection of protein families. Nucleic Acids Res. 30:1575‐1584.
   The Gene Ontology Consortium. 2000. Gene ontology: Tool for the unification of biology. Nat. Genet. 25:25‐29.
   Li, L., Stoeckert, C.J. Jr., and Roos, D.S. 2003. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 13:2178‐2189.
   Webb, E., and International Union of Biochemistry and Molecular Biology. Enzyme nomenclature 1992: Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the nomenclature and classification of enzymes. 1984th ed. Academic Press, New York.
Key References
   Li et al., 2003. See above.
   The original paper describing the OrthoMCL algorithm.
   Chen et al., 2006. See above.
   A paper describing the OrthoMCL‐DB.
   Chen et al., 2007. See above.
   A paper comparing OrthoMCL to other approaches.
Internet Resources
   http://orthomcl.org
   The OrthoMCL‐Db site
   http://pfam.sanger.ac.uk/search#tabview=tab1
   Submit a set of proteins to find Pfam domains
   http://www.ebi.ac.uk/Tools/msa/clustalw2/
   Submit a set of proteins for multiple sequence alignment
   http://www.biolayout.org/
   Download software to visualize groups using Biolayout.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
 
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
关注公众号
反馈
TOP
打开小程序