Inferring Protein Function from Homology Using the Princeton Protein Orthology Database (P‐POD)
互联网
- Abstract
- Table of Contents
- Figures
- Literature Cited
Abstract
Inferring a protein's function by homology is a powerful tool for biologists. The Princeton Protein Orthology Database (P?POD) offers a simple way to visualize and analyze the relationships between homologous proteins in order to infer function. P?POD contains computationally generated analysis distinguishing orthologs from paralogs combined with curated published information on functional complementation and on human diseases. P?POD also features an applet, Notung, for users to explore and modify phylogenetic trees and generate their own ortholog/paralogs calls. This unit describes how to search P?POD for precomputed data, how to find and use the associated curated information from the literature, and how to use Notung to analyze and refine the results.Curr. Protoc. Bioinform. 33:6.11.1?6.11.12. © 2011 by John Wiley & Sons, Inc.
Keywords: functional complementation; disease; conservation; phylogenetic analysis; trees; paralogs; Notung
Table of Contents
- Introduction
- Basic Protocol 1: Searching for Homologs
- Basic Protocol 2: Investigating the Conserved Function and Significance of a Protein
- Basic Protocol 3: Using the Notung Applet to Examine Homology Relationships in Greater Detail
- Guidelines for Understanding Results
- Commentary
- Literature Cited
- Figures
- Tables
Materials
Figures
-
Figure 6.11.1 The P‐POD search interface at http://ppod.princeton.edu/. View Image -
Figure 6.11.2 Sample P‐POD search results. In this example, a search for S. cerevisiae proteins matching “2678” returned the SGD database identifier S000002678, the UniProt identifier P26784, and seven other proteins (not shown). Matching strings are highlighted in pink. The right half of the table shows the compositions of the four types of families: OrthoMCL, MultiParanoid, Jaccard, and Naïve Ensemble. Each family is represented as a 4 × 3 array of circles that use a two‐letter code to show the number of proteins from each of the organisms; e.g., “At 2” means that the family contains two proteins from Arabidopsis thaliana . If a protein is not assigned to a family in a particular analysis, the word “orphan” appears. If the icon for a family does not load properly, the family name–e.g., “OrthoMCL884”–still appears in the relevant table entry. View Image -
Figure 6.11.3 The Protein Family Page contains a phylogenetic tree and a table of the members of the family. Blue text and symbols in the table represent linkouts. The four gray tabs near the top of the page allow the user to switch back and forth to other types of data. The Notung Tree Analysis link activates the Notung applet. View Image -
Figure 6.11.4 Click the Functional Conservation tab to display a list of curated experiments describing complementation and exogenous expression experiments. The right‐hand column contains curator notes describing the experiment and its results. Linkouts in the middle column connect to the PubMed entry for the paper. This list has been truncated due to space considerations. View Image -
Figure 6.11.5 Click the Disease References tab to display linkouts to OMIM diseases associated with human members of the protein family (top panel) and a list of papers from SGD containing information about yeast genes with human homologs involved in disease. This list has been truncated due to space considerations. View Image -
Figure 6.11.6 The Notung applet window. The top pane shows the protein family phylogenetic tree and legend. The bottom pane has several tabs, each with its own set of buttons and checkboxes, allowing access to functions to modify the tree. Additional functions are accessible through the drop‐down menus at the top. The Edit Values button in the lower right corner allows the user to change tree parameters. View Image -
Figure 6.11.7 The P‐POD pipeline. Protein sequences were assigned to families using several different techniques, and curated information from several sources is displayed with the computational results. View Image
Videos
Literature Cited
Alexeyenko, A., Tamas, I., Liu, G., and Sonnhammer, E.L.L. 2006. Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics 22:e9‐e15. | |
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410. | |
Durand, D., Halldórsson, B.V., and Vernot, B. 2006. A hybrid micro‐macroevolutionary approach to gene tree reconstruction. J. Comput. Biol. 13:320‐335. | |
Guindon, S. and Gascuel, O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696‐704. | |
Hamosh, A., Scott, A.F., Amberger, J., Bocchini, C., Valle, D., and McKusick, V.A. 2002. Online Mendelian Inheritance in Man (OMIM): A knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 30:52‐55. | |
Heinicke, S., Livstone, M.S., Lu, C., Oughtred, R., Kang, F., Angiuoli, S.V., White, O., Botstein, D., and Dolinski, K. 2007. The Princeton Protein Orthology Database (P‐POD): A comparative genomics analysis tool for biologists. PLoS One 22:e766. | |
Katoh, K. and Toh, H. 2008. Recent developments in the MAFFT multiple sequence alignment program. Brief. Bioinform. 9:286‐298. | |
Li, L., Stoeckert, C.J. Jr., and Roos, D.S. 2003. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 13:2178‐2189. | |
Mi, H., Dong, Q., Muruganujan, A., Gaudet, P., Lewis, S., and Thomas, P.D. 2010. PANTHER version 7: Improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium. Nucleic Acids Res. 38:D204‐D210. | |
Reference Genome Group of the Gene Ontology Consortium. 2009. The Gene Ontology's Reference Genome Project: A unified framework for functional annotation across species. PLoS Comput. Biol. 5:e1000431. | |
Key References | |
Heinicke et al., 2007. See above. | |
The original 2007 P‐POD paper, with discussion of the reasons for building P‐POD and testing of the literature curation. The pipeline and user interface have changed since 2007; refer to the P‐POD help page (below) for a current technical description of P‐POD. | |
Durand et al., 2006. See above | |
Technical description of Notung. | |
Internet Resources | |
http://ppod.princeton.edu/ | |
The main P‐POD page and search interface. | |
http://ppod.princeton.edu/help/ | |
The P‐POD help page contains an overview of the P‐POD pipeline, a brief tutorial, and links to additional information. | |
http://ppod.princeton.edu/help/help_identifiers.html | |
Valid identifiers for P‐POD and sample searches. | |
http://ppod.princeton.edu/help/help_tech.html | |
P‐POD technical information, including version numbers and settings for all software in the P‐POD pipeline. | |
http://ppod.princeton.edu/help/help_notung_ortho_para.html | |
A more extensive and illustrated explanation of how Notung infers orthologs and paralogs in P‐POD. | |
ftp://gen‐ftp.princeton.edu/ppod/ | |
The P‐POD ftp site containing all families, support files, and the 48‐species PANTHER 7.0 dataset. The current release is in the “version4” folder. More detail is available in README's. | |
http://ppod.princeton.edu/help/help_data_archive.html | |
Archival technical information for the original 2007 P‐POD release only. | |
http://www.cs.cmu.edu/∼durand/Notung/ | |
The Notung application and documentation. | |
http://www.ncbi.nlm.nih.gov/omim | |
Online Mendelian Inheritance in Man (OMIM). | |
http://www.pantherdb.org/ | |
The PANTHER 7.0 database. | |
http://www.yeastgenome.org/ | |
The Saccharomyces Genome Database. | |
http://www.geneontology.org/GO.refgenome.shtml | |
Homepage of the Gene Ontology Consortium's Reference Genome project. | |
http://amigo.geneontology.org/cgi‐bin/amigo/go.cgi | |
The Gene Ontology Consortium's AmiGO database. | |
http://evolution.genetics.washington.edu/phylip/newicktree.html | |
Description of the Newick tree format. |