Prediction of Protein‐Protein Interaction Networks

互联网2013-12-31

2761

Abstract
Table of Contents
Figures
Literature Cited

Abstract

This unit offers a general overview of several techniques that have been developed for inferring functional and/or protein?protein interaction networks. The majority of these use whole?genome sequences as their primary input source of data. In addition, a few methods that utilize both protein features and experimental protein?protein interaction data directly in the prediction of new interactions have recently been developed. While an exhaustive list of approaches is not presented, it is hoped that the reader will gain a sense of how these approaches are implemented and an idea of their relative strengths and weaknesses, and a broader perspective on the type of work being conducted in this highly active area of research. Curr. Protoc. Bioinform. 22:8.2.1?8.2.14. © 2008 by John Wiley & Sons, Inc.

Keywords: protein interactions; bioinformatics; interaction networks

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Introduction
Approaches
Observations and Conclusions
Literature Cited
Figures

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 8.2.1 Diagram of conserved gene cluster approach used by Overbeek et al. ().

View Image

Figure 8.2.2 Commonly used descriptors of prediction accuracy. In this example, a true positive (TP) is one which the interaction is both known to exist and predicted to exist. A false positive (FP) is one in which the interaction is known not to exist, but predicted as existing. True (TN) and false negatives (FN) are the negatives of these conditions, respectively. Based on this table, the success rate or total accuracy is equal to (TP + TN)/(TP + FP + TN + FN); the sensitivity, TP rate, or recall is equal to TP/(TP + FN); the specificity or precision is equal to TP/(TP + FP); and the FP rate is equal to FP/(FP + TN).

View Image

Figure 8.2.3 One method of gene fusion. Individual proteins, A and B, from one genome can often be found as a single fused protein, C, in another genome. The finding of such a fused protein suggests that protein A and B interact either physically or functionally.

View Image

Figure 8.2.4 The phylogenetic profile method. Genomes (G1 to G6) are searched for the absence (0) or presence (1) of proteins (P1 to P6). Genes with identical profiles, or perhaps differing at a single position, can be linked into functionally related groups.

View Image

Figure 8.2.5 Coevolution and correlation of phylogenetic distances. (A ) Trees or sequence alignments of two possibly interacting protein families are first generated along with the 16S ribosomal RNA sequence alignments for the same taxa. (B ) Distance matrices are generated from the alignments (with tree‐of‐life distances subtracted from the distance matrices in the case of the tol‐mirrortree approach) and the correlation (C ) between matrices determined, typically, using the Pearson correlation coefficient.

View Image

Figure 8.2.6 Extraction of domain data for the prediction of protein interactions. Given a set of protein interactions, all individual domain‐domain interactions are extracted and counted. After training, counts are converted into probabilities of domain‐domain interaction as well as protein‐protein interaction. In the second stage, network topology is incorporated to improve predictions. See text for details.

View Image

Figure 8.2.7 A sample connectivity distribution for a yeast protein network extracted from the DIP database (see Internet Resources). The majority of proteins will have few interactions (left end of the x axis); however, a few will be highly connected (right end).

View Image

Videos

Literature Cited

	Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402.
	Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A., Gelpke, M.D., Roach, J., Oh, T., Ho, I.Y., Wong, M., Detter, C., Verhoef, F., Predki, P., Tay, A., Lucas, S., Richardson, P., Smith, S.F., Clark, M.S., Edwards, Y.J., Doggett, N., Zharkikh, A., Tavtigian, S.V., Pruss, D., Barnstead, M., Evans, C., Baden, H., Powell, J., Glusman, G., Rowen, L., Hood, L., Tan, Y.H., Elgar, G., Hawkins, T., Venkatesh, B., Rokhsar, D., and Brenner, S. 2002. Whole‐genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:1301‐1310.
	Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, I., Corpet, L.F., Croning, M.D., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Hermjakob, H., Hulo, N., Jonassen, I., Kahn, D., Kanapin, A., Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N.J., Oinn, T.M., Pagni, M., Servant, F., Sigrist, C.J., and Zdobnov, E.M. 2001. The InterPro database, an integrated documentation resource for protein families domains and functional sites. Nucleic Acids Res. 29:37‐40.
	Bader, G.D., Donaldson, I., Wolting, C., Ouellette, B.F., Pawson, T., and Hogue, C.W. 2001. BIND—The Biomolecular Interaction Network Database. Nucleic Acids Res. 29:242‐245.
	Barabasi, A.L. and Albert, R. 1999. Emergence of scaling in random networks. Science 286:509‐512.
	Barker, D. and Pagel M. 2005. Predicting functional gene links from phylogenetic‐statistical analyses of whole genomes. PLoS Comput. Biol. 1:e3.
	Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Finn, R.D., and Sonnhammer, E.L. 1999. Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins. Nucleic Acids Res. 27:260‐262.
	Ben‐Hur, A. and Noble, WS. 2005. Kernel methods for predicting protein‐protein interactions. Bioinformatics 21:i38‐i46.
	Berger, J.M., Gamblin, S.J., Harrison, S.C., and Wang, J.C. 1996. Structure and mechanism of DNA topoisomerase II. Nature 379:225‐232.
	Bock, J.R. and Gough, D.A. 2001. Predicting protein—protein interactions from primary structure. Bioinformatics 17:455‐460.
	Botstein, D. 1999. Of genes and genomes. Ann. N.Y. Acad. Sci. 882:32‐41.
	Corpet, F., Gouzy, J., and Kahn, D. 1998. The ProDom database of protein domain families. Nucleic Acids Res. 26:323‐326.
	Craig, R.A. and Liao, L. 2007. Phylogenetic tree information aids supervised learning for predicting protein‐protein interaction based on distance matrices. BMC Bioinformatics. 8:6.
	Dandekar, T., Snel, B., Huynen, M., and Bork, P. 1998. Conservation of gene order: A fingerprint of proteins that physically interact. Trends Biochem. Sci. 23:324‐328.
	Demerec, M.E. and Hartman, P. 1959. Complex loci in microorganisms. Annu. Rev. Microbiol. 13:377‐406.
	Deng, M., Mehta, S., Sun, F., and Chen, T. 2002. Inferring domain‐domain interactions from protein‐protein interactions. Genome Res. 12:1540‐1548.
	Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. 1998. Cluster analysis and display of genome‐wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95:14863‐14868.
	Eisenberg, D., Marcotte, E.M., Xenarios, I., and Yeates, T.O. 2000. Protein function in the post‐genomic era. Nature 405:823‐826.
	Enright, A.J., Iliopoulos, I., Kyrpides, N.C., and Ouzounis, C.A. 1999. Protein interaction maps for complete genomes based on gene fusion events. Nature 402:86‐90.
	Fryxell, K.J. 1996. The coevolution of gene family trees. Trends Genet 12:364‐369.
	Goh, C.‐S., Bogan, A.A., Joachimiak, M., Walther, D., and Cohen, F.E. 2000. Co‐evolution of proteins with their interaction partners. J. Mol. Biol. 299:283‐293.
	Gomez, S.M. and Rzhetsky, A. 2002. Towards the prediction of complete protein—protein interaction networks. Pac. Symp. Biocomput. 2002:413‐424.
	Gomez, S.M., Lo, S.H., and Rzhetsky, A. 2001. Probabilistic prediction of unknown metabolic and signal‐transduction networks. Genetics 159:1291‐1298.
	Gomez, S.M., Noble, W.S., and Rzhetsky, A. 2003. Learning to predict protein‐protein interactions from protein sequences. Bioinformatics 19:1875‐1881.
	Hallas, C., Pekarsky, Y., Itoyama, T., Varnum, J., Bichi, R., Rothstein, J.L., and Croce, C.M. 1999. Genomic analysis of human and mouse TCL1 loci reveals a complex of tightly clustered genes. Proc. Natl. Acad. Sci. U.S.A. 96:14418‐14423.
	Ito, T., Tashiro, K., Muta, S., Ozawa, R., Chiba, T., Nishizawa, M., Yamamoto, K., Kuhara, S., and Sakaki, Y. 2000. Toward a protein‐protein interaction map of the budding yeast: A comprehensive system to examine two‐hybrid interactions in all possible combinations between the yeast proteins. Proc. Natl. Acad. Sci. U.S.A. 97:1143‐1147.
	Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., and Gerstein, M. 2003. A Bayesian networks approach for predicting protein‐protein interactions from genomic data. Science 302:449‐453.
	Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., and Barabasi, A.L. 2000. The large‐scale organization of metabolic networks. Nature 407:651‐654.
	Jeong, H., Mason, S.P., Barabasi, A.L., and Oltvai, Z.N. 2001. Lethality and centrality in protein networks. Nature 411:41‐42.
	Jothi, R., Kann, M.G., and Przytycka, T.M. 2005. Predicting protein‐protein interaction by searching evolutionary tree automorphism space. Bioinformatics 21:i241‐i250.
	Jothi, R., Cherukuri, P.F., Tasneem, A., and Przytycka, T.M. 2006. Co‐evolutionary analysis of domains in interacting proteins reveals insights into domain‐domain interactions mediating protein‐protein interactions. J. Mol. Biol. 362:861‐875.
	Lawrence, J.G. 2002. Shared strategies in gene organization among prokaryotes and eukaryotes. Cell 110:407‐413.
	Lin, N., Wu, B., Jansen, R., Gerstein, M., and Zhao, H. 2005. Information assessment on predicting protein‐protein interactions. BMC Bioinformatics 5:154.
	Lu, L.G., Xia, Y., Paccanaro, A., Yu, H., and Gerstein, M. 2005. Assessing the limits of genomic data integration for predicting protein networks. Genome Res. 15:945‐953.
	Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., and Eisenberg, D. 1999. Detecting protein function and protein‐protein interactions from genome sequences. Science 285:751‐753.
	Moyle, W.R., Campbell, R.K., Myers, R.V., Bernard, M.P., Han, Y., and Wang, X. 1994. Co‐evolution of ligand‐receptor pairs. Nature 368:251‐255.
	Overbeek, R., Fonstein, M., D'Souza, M., Pusch, G.D., and Maltsev, N. 1999. The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. U.S.A. 96:2896‐2901.
	Pazos, F. and Valencia, A. 2001. Similarity of phylogenetic trees as an indicator of protein‐protein interaction. Protein Eng. 14:609‐614.
	Pazos, F., Ranea, J.A., Juan, D., and Sternberg, M.J. 2005. Assessing protein co‐evolution in the context of the tree of life assists in the prediction of the interactome. J. Mol. Biol. 352:1002‐1015.
	Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., and Yeates, T.O. 1999. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc. Natl. Acad. Sci. U.S.A. 96:4285‐4288.
	Ramani, A.K. and Marcotte, E.M. 2003. Exploiting the co‐evolution of interacting proteins to discover interaction specificity. J. Mol. Biol. 327:273‐284.
	Rhodes, D.R., Tomlins, S.A., Varambally, S., Mahavisno, V., Barrette, T., Kalyana‐Sundaram, S., Ghosh, D., Pandey, A., and Chinnaiyan, A.M. 2005. Probabilistic model of the human protein‐protein interaction network. Nat. Biotechnol. 23:951‐959.
	Riley, R., Lee, C., Sabatti, C., and Eisenberg, D. 2005. Inferring protein domain interactions from databases of interacting proteins. Genome Biol. 6:R89.
	Sato, T., Yamanishi, Y., Kanehisa, M., and Toh, H. 2005. The inference of protein‐protein interactions by co‐evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics 21:3482‐3489.
	Scott, M.S. and Barton, G.J. 2007. Probabilistic prediction and ranking of human protein‐protein interactions. Bioinformatics 8:239.
	Sprinzak, E. and Margalit, H. 2001. Correlated sequence‐signatures as markers of protein‐protein interaction. J. Mol. Biol. 311:681‐692.
	Sprinzak, E., Sattath, S., and Margalit, H. 2003. How reliable are experimental protein‐protein interaction data? J. Mol. Biol. 327:919‐923.
	Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., Qureshi‐Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., and Rothberg, J.M. 2000. A comprehensive analysis of protein‐protein interactions in Saccharomyces cerevisiae. Nature 403:623‐627.
	von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., and Bork, P. 2002. Comparative assessment of large‐scale data sets of protein‐protein interactions. Nature 417:399‐403.
	Witten, I.H. and Frank, E. 2000. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, Calif.
	Wojcik, J. and Schachter, V. 2001. Protein‐protein interaction map inference using interacting domain profile pairs. Bioinformatics 17:S296‐S305.
	Wu, Q. and Maniatis, T. 1999. A striking organization of a large family of human neural cadherin‐like cell adhesion genes. Cell 97:779‐790.
	Xenarios, I., Rice, D.W., Salwinski, L., Baron, M.K., Marcotte, E.M., and Eisenberg, D. 2000. DIP: The database of interacting proteins. Nucleic Acids Res. 28:289‐291.
	Yanai, I., Derti, A., and DeLisi, C. 2001. Genes linked by fusion events are generally of the same functional category: A systematic analysis of 30 microbial genomes. Proc. Natl. Acad. Sci. U.S.A. 98:7940‐7945.
	Yeang, C.‐H. and Haussler, D. 2007. Detecting coevolution in and among protein domains. PLoS Comput. Biol. 3:e211.
	Zhang, L.V., Wong, S.L., King, O.D., and Roth, F.P. 2004. Predicting co‐complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics. 5:38.
Internet Resources
	http://dip.doe‐mbi.ucla.edu
	The Database of Interacting Proteins (DIP). A database of both manually and automatically curated experimental protein‐protein interactions.
	http://string.embl.de
	STRING is a database of known and predicted protein‐protein interactions. The interactions include direct (physical) and indirect (functional) associations taken from high‐throughput experiments, genomic context, coexpression, and literature.
	http://www.bind.ca
	The Biomolecular Interaction Network Database (BIND). Database of interactions, molecular complexes, and pathways. Includes interactions other than protein‐protein (e.g., protein‐DNA).
	http://cbm.bio.uniroma2.it/mint
	The Molecular Interactions Database (MINT). A manually curated database designed to store functional interactions between biological molecules (i.e., proteins RNA and DNA).
	http://portal.curagen.com/extpc/com.curagen.portal.servlet.Yeast
	PathCalling Yeast Interaction Database. Database of results from Uetz et al. ().
	http://wit.mcs.anl.gov/WIT2
	The WIT homepage. A Web site of reconstructed metabolic pathways for a number of genomes.
	http://mips.gsf.de
	The Munich Information Center for Protein Sequences (MIPS) homepage. Maintains curated database designed to store functional interactions between biological molecules (e.g., proteins, RNA, DNA).
	http://www.genome.ad.jp/kegg
	KEGG: Kyoto Encyclopedia of Genes and Genomes. In addition to other material, this site provides a database of molecular interactions as well as metabolic and signal transduction pathways.
	http://www.ecocyc.org
	The Encyclopedia of Escherichia coli Genes and Metabolism (EcoCyc) Web site.
	http://pim.hybrigenics.com
	Web site for Hybrigenics’ Protein Interaction Map (PIM) functional proteomics software platform.

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Prediction of Protein‐Protein Interaction Networks

Abstract

Table of Contents

Materials

Figures

Videos

Literature Cited