Inferring Protein Function from Homology Using the Princeton Protein Orthology Database (P‐POD)

互联网2013-12-31

1300

Abstract
Table of Contents
Figures
Literature Cited

Abstract

Inferring a protein's function by homology is a powerful tool for biologists. The Princeton Protein Orthology Database (P?POD) offers a simple way to visualize and analyze the relationships between homologous proteins in order to infer function. P?POD contains computationally generated analysis distinguishing orthologs from paralogs combined with curated published information on functional complementation and on human diseases. P?POD also features an applet, Notung, for users to explore and modify phylogenetic trees and generate their own ortholog/paralogs calls. This unit describes how to search P?POD for precomputed data, how to find and use the associated curated information from the literature, and how to use Notung to analyze and refine the results.Curr. Protoc. Bioinform. 33:6.11.1?6.11.12. © 2011 by John Wiley & Sons, Inc.

Keywords: functional complementation; disease; conservation; phylogenetic analysis; trees; paralogs; Notung

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Introduction
Basic Protocol 1: Searching for Homologs
Basic Protocol 2: Investigating the Conserved Function and Significance of a Protein
Basic Protocol 3: Using the Notung Applet to Examine Homology Relationships in Greater Detail
Guidelines for Understanding Results
Commentary
Literature Cited
Figures
Tables

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 6.11.1 The P‐POD search interface at http://ppod.princeton.edu/.

View Image

Figure 6.11.2 Sample P‐POD search results. In this example, a search for S. cerevisiae proteins matching “2678” returned the SGD database identifier S000002678, the UniProt identifier P26784, and seven other proteins (not shown). Matching strings are highlighted in pink. The right half of the table shows the compositions of the four types of families: OrthoMCL, MultiParanoid, Jaccard, and Naïve Ensemble. Each family is represented as a 4 × 3 array of circles that use a two‐letter code to show the number of proteins from each of the organisms; e.g., “At 2” means that the family contains two proteins from Arabidopsis thaliana . If a protein is not assigned to a family in a particular analysis, the word “orphan” appears. If the icon for a family does not load properly, the family name–e.g., “OrthoMCL884”–still appears in the relevant table entry.

View Image

Figure 6.11.3 The Protein Family Page contains a phylogenetic tree and a table of the members of the family. Blue text and symbols in the table represent linkouts. The four gray tabs near the top of the page allow the user to switch back and forth to other types of data. The Notung Tree Analysis link activates the Notung applet.

View Image

Figure 6.11.4 Click the Functional Conservation tab to display a list of curated experiments describing complementation and exogenous expression experiments. The right‐hand column contains curator notes describing the experiment and its results. Linkouts in the middle column connect to the PubMed entry for the paper. This list has been truncated due to space considerations.

View Image

Figure 6.11.5 Click the Disease References tab to display linkouts to OMIM diseases associated with human members of the protein family (top panel) and a list of papers from SGD containing information about yeast genes with human homologs involved in disease. This list has been truncated due to space considerations.

View Image

Figure 6.11.6 The Notung applet window. The top pane shows the protein family phylogenetic tree and legend. The bottom pane has several tabs, each with its own set of buttons and checkboxes, allowing access to functions to modify the tree. Additional functions are accessible through the drop‐down menus at the top. The Edit Values button in the lower right corner allows the user to change tree parameters.

View Image

Figure 6.11.7 The P‐POD pipeline. Protein sequences were assigned to families using several different techniques, and curated information from several sources is displayed with the computational results.

View Image

Videos

Literature Cited

	Alexeyenko, A., Tamas, I., Liu, G., and Sonnhammer, E.L.L. 2006. Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics 22:e9‐e15.
	Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
	Durand, D., Halldórsson, B.V., and Vernot, B. 2006. A hybrid micro‐macroevolutionary approach to gene tree reconstruction. J. Comput. Biol. 13:320‐335.
	Guindon, S. and Gascuel, O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696‐704.
	Hamosh, A., Scott, A.F., Amberger, J., Bocchini, C., Valle, D., and McKusick, V.A. 2002. Online Mendelian Inheritance in Man (OMIM): A knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 30:52‐55.
	Heinicke, S., Livstone, M.S., Lu, C., Oughtred, R., Kang, F., Angiuoli, S.V., White, O., Botstein, D., and Dolinski, K. 2007. The Princeton Protein Orthology Database (P‐POD): A comparative genomics analysis tool for biologists. PLoS One 22:e766.
	Katoh, K. and Toh, H. 2008. Recent developments in the MAFFT multiple sequence alignment program. Brief. Bioinform. 9:286‐298.
	Li, L., Stoeckert, C.J. Jr., and Roos, D.S. 2003. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res. 13:2178‐2189.
	Mi, H., Dong, Q., Muruganujan, A., Gaudet, P., Lewis, S., and Thomas, P.D. 2010. PANTHER version 7: Improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium. Nucleic Acids Res. 38:D204‐D210.
	Reference Genome Group of the Gene Ontology Consortium. 2009. The Gene Ontology's Reference Genome Project: A unified framework for functional annotation across species. PLoS Comput. Biol. 5:e1000431.
Key References
	Heinicke et al., 2007. See above.
	The original 2007 P‐POD paper, with discussion of the reasons for building P‐POD and testing of the literature curation. The pipeline and user interface have changed since 2007; refer to the P‐POD help page (below) for a current technical description of P‐POD.
	Durand et al., 2006. See above
	Technical description of Notung.
Internet Resources
	http://ppod.princeton.edu/
	The main P‐POD page and search interface.
	http://ppod.princeton.edu/help/
	The P‐POD help page contains an overview of the P‐POD pipeline, a brief tutorial, and links to additional information.
	http://ppod.princeton.edu/help/help_identifiers.html
	Valid identifiers for P‐POD and sample searches.
	http://ppod.princeton.edu/help/help_tech.html
	P‐POD technical information, including version numbers and settings for all software in the P‐POD pipeline.
	http://ppod.princeton.edu/help/help_notung_ortho_para.html
	A more extensive and illustrated explanation of how Notung infers orthologs and paralogs in P‐POD.
	ftp://gen‐ftp.princeton.edu/ppod/
	The P‐POD ftp site containing all families, support files, and the 48‐species PANTHER 7.0 dataset. The current release is in the “version4” folder. More detail is available in README's.
	http://ppod.princeton.edu/help/help_data_archive.html
	Archival technical information for the original 2007 P‐POD release only.
	http://www.cs.cmu.edu/∼durand/Notung/
	The Notung application and documentation.
	http://www.ncbi.nlm.nih.gov/omim
	Online Mendelian Inheritance in Man (OMIM).
	http://www.pantherdb.org/
	The PANTHER 7.0 database.
	http://www.yeastgenome.org/
	The Saccharomyces Genome Database.
	http://www.geneontology.org/GO.refgenome.shtml
	Homepage of the Gene Ontology Consortium's Reference Genome project.
	http://amigo.geneontology.org/cgi‐bin/amigo/go.cgi
	The Gene Ontology Consortium's AmiGO database.
	http://evolution.genetics.washington.edu/phylip/newicktree.html
	Description of the Newick tree format.

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Inferring Protein Function from Homology Using the Princeton Protein Orthology Database (P‐POD)

Abstract

Table of Contents

Materials

Figures

Videos

Literature Cited