Protein Structural Domains: Definition and Prediction

互联网2013-12-31

1791

Abstract
Table of Contents
Figures
Literature Cited

Abstract

Recognition and prediction of structural domains in proteins is an important part of structure and function prediction. This unit lists the range of tools available for domain prediction, and describes sequence and structural analysis tools that complement domain prediction methods. Also detailed are the basic domain prediction steps, along with suggested strategies for different protein sequences and potential pitfalls in domain boundary prediction. The difficult problem of domain orientation prediction is also discussed. All the resources necessary for domain boundary prediction are accessible via publicly available Web servers and databases and do not require computational expertise. Curr. Protoc. Protein Sci. 66:2.14.1?2.14.16. © 2011 by John Wiley & Sons, Inc.

Keywords: structural domains; domain parsing; homology modeling; ab initio predictions; functional domains

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Introduction
What are Structural Domains?
How Structural Domains are Defined
Predicting Structural Domains
Initial Steps in Identifying Protein Domains
Methods for Domain Prediction
Evaluating Domain Predictors
Domain‐Domain Interactions
Potential Problems
Literature Cited
Figures
Tables

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 2.14.1 Domain parsing for two targets from the CASP structure prediction experiments. Target T0457 (A ) has two clearly defined domains without insertions with a short linker and few contacts between the domains. Target T0487 (the Argonaut silencing complex) is very complex to parse (B ). In the CASP8 experiment it was parsed into five domains (shown here in different shades); the domains were not linear in sequence and there were a number of domain extensions (or decorations). The orientation of the domains is complicated by the fact that the Argonaut silencing complex also binds DNA.

View Image

Figure 2.14.2 Four different ways to parse the same structure, target T0424 in CASP8. The beta‐sheet sub‐structure on the left in all the figures is a duplication, and although the duplication would be regarded as a single domain based on the strength of residue‐residue contacts between the sub‐units (B and C ), it could also be regarded as two domains from an evolutionary point of view (A and D ).

View Image

Figure 2.14.3 Flow chart summarizing the key steps involved in domain boundary prediction.

View Image

Figure 2.14.4 Two domains with a short linker where docking might be used with constraints to predict domain orientation. In fact, CASP target T0323 has an inserted domain (on the right) and therefore has two separate constraints that could have been used to limit the docking possibilities.

View Image

Figure 2.14.5 CASP target T0427, two domains joined by a long linker. The two domains are shown in different shades and the linker in black. Although the linking residues that join the two domains interact with the surface of domain 1 (shown on the left), there were no similar structures with these linking residues, so they could not have been modeled. The long linker even allows domain 2 to interact with the opposite face of domain 1, so here it would not have been possible to use docking constraints to limit the possibilities of interaction.

View Image

Figure 2.14.6 A difficult‐to‐predict case. CASP target T0547 has four domains shown in different shades. Although there are remotely similar structural templates for the two larger domains, the domain boundaries of the two smaller domains would have to have been predicted by ab initio methods.

View Image

Figure 2.14.7 Domain decorations. For each of these CASP targets there was a structurally similar template that could have been used for modeling for the regions shown in darker shades, but there were no templates for the regions shown in lighter shades. The regions without a template are domain decorations and are difficult to predict. Target T0510 (in A ) had two structural domains (shown in darker shades) and a small C‐terminal extension that more or less folded on its own. Target T0395 (B ) formed a single domain with a C‐terminal decoration that interacts with the domain surface and even forms a knot. Target T0409 (C ) is shown as a dimer. The N‐terminal extension interacts with the other chain in the dimer.

View Image

Videos

Literature Cited

Literature Cited
	Alexandrov, N. and Shindyalov, I. 2003. PDP: Protein domain parser. Bioinformatics 19:429‐430.
	Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
	Attwood, T.K. 2002. The PRINTS database: A resource for identification of protein families. Briefings Bioinformat. 3:252‐263.
	Bendtsen, J.D., Nielsen, H., von Heijne, G., and Brunak, S. 2004. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340:783‐795.
	Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The protein data bank. Nucleic Acids Res. 28:235‐242.
	Bernsel, A., Viklund, H., Hennerdal, A., and Elofsson, A. 2009. TOPCONS: Consensus prediction of membrane protein topology. Nucleic Acids Res. 37:W465‐W468.
	Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O'Donovan, C., Phan, I., Pilbout, S., and Schneider, M. 2003. The SWISS‐PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31:365‐370.
	Bryson, K., McGuffin, L.J., Marsden, R.L., Ward, J.J., Sodhi, J.S., and Jones, D.T. 2005. Protein structure prediction servers at University College London. Nucleic Acids Res. 33:W36‐W38.
	Cheng, J. 2007. DOMAC: An accurate, hybrid protein domain prediction server. Nucleic Acids Res. 35:W354‐W356.
	Cheng, J., Sweredoski, M.J., and Baldi, P. 2006. DOMpro: Protein domain prediction using profiles, secondary structure, relative solvent accessibility, and recursive neural networks. Data Mining Knowl. Disc. 13:1‐10.
	Chivian, D., Kim, D.E., Malmström, L., Bradley, P., Robertson, T., Murphy, P., Strauss, C.E., Bonneau, R., Rohl, C.A., and Baker, D. 2003. Automated prediction of CASP‐5 structures using the Robetta server. Proteins 53:524‐533.
	Chothia, C. and Janin, J. 1975. Principles of protein‐protein recognition. Nature 256:705‐708.
	Chothia, C. 1992. One thousand families for the molecular biologist. Nature 357:543‐544.
	Chothia, C., Gough, J., Vogel, C., and Teichmann, S.A. 2003. Evolution of the protein repertoire. Science 300:1701‐1703.
	Coggill, P., Finn, R.D., and Bateman, A. 2008. Identifying protein domains with the Pfam database. Curr. Protoc. Bioinform. 23:2.5.1‐2.5.17.
	Cole, C., Barber, J.D., and Barton, G.J. 2008. The Jpred 3 secondary structure prediction server. Nucleic Acids Res. 36:W197‐W201.
	Contreras‐Moreira, B., and Bates, P.A. 2002. Domain Fishing: A first step in protein comparative modeling. Bioinformatics 18:1141‐1142.
	Dhir, S., Pacurar, M, Franklin, D, Gáspári, Z, Kertész‐Farkas, A, Kocsor, A, Eisenhaber, F, and Pongor, S. 2010. Detecting atypical examples of known domain types by sequence similarity searching: The SBASE domain library approach. Curr. Protein Peptide Sci. 11:538‐549.
	Dosztányi, Z., Csizmok, V., Tompa, P., and Simon, I. 2005. IUPred: Web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21:3433‐3434.
	Dumontier, M., Yao, R., Feldman, H.J., and Hogue, C.W. 2005. Armadillo: Domain boundary prediction by amino acid composition. J. Mol. Biol. 350:1061‐1073.
	Eickholt, J., Deng, X., and Cheng, J. 2011. DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning. BMC Bioinform. 12:43.
	Eswar, N., Webb, B., Marti‐Renom, M.A., Madhusudhan, M., Eramian, D., Shen, M.‐y., Pieper, U., and Sali, A. 2007. Comparative protein structure modeling using MODELLER. Curr. Protoc. Protein Sci. 50:2.9.1‐2.9.31.
	Ezkurdia, I., Graña, O., Izarzugaza, J.M., and Tress, M.L. 2009. Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8. Proteins 77:S196‐S209.
	Finn, R.D. Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J.E., Gavin, O.L., Gunasekaran, P., Ceric, G., Forslund, K., Holm, L., Sonnhammer, E.L., Eddy, S.R., and Bateman, A. 2009. The Pfam protein families database. Nucleic Acids Res. 38:D211‐D222.
	Fiser, A. and Sali, A. 2003. Modeller: Generation and refinement of homology‐based protein structure models. Methods Enzymol. 374:461‐491.
	Galzitskaya, O.V. and Melnik, B.S. 2003. Prediction of protein domain boundaries from sequence alone. Protein Sci. 12:696‐701.
	George, D.G., Dodson, RJ, Garavelli, JS, Haft, DH, Hunt, LT, Marzec, CR, Orcutt, BC, Sidman, KE, Srinivasarao, GY, Yeh, LS, Arminski, LM, Ledley, RS, Tsugita, A, and Barker, WC. 1997. The protein information resource (PIR) and the PIR‐international protein sequence database. Nucleic Acids Res. 25:24‐28.
	George, R.A. and Heringa, J. 2002. SnapDRAGON: A method to delineate protein structural domains from sequence data. J. Mol. Biol. 316:839‐851.
	Gracy, J. and Argos, P. 1998. Automated protein sequence database classification. II. Delineation of domain boundaries from sequence similarities. Bioinformatics 14:174‐187.
	Hadley, C. and Jones, D T. 1999. A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. Structure 7:1099‐1112.
	Henikoff, S., Henikoff, J.G., and Pietrokovski, S. 1999. Blocks+: A non‐redundant database of protein alignment blocks derived from multiple compilations. Bioinformatics 15:471‐479.
	Holland, T.A., Veretnik, S., Shindyalov, I.N., and Bourne, P.E. 2006. Partitioning protein structures into domains: Why is it so difficult? J. Mol. Biol. 361:562hyphen;590.
	Hunter, S., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Das, U., Daugherty, L., Duquenne, L., Finn, R.D., Gough, J., Haft, D., Hulo, N., Kahn, D., Kelly, E., Laugraud, A., Letunic, I., Lonsdale, D., Lopez, R., Madera, M., Maslen, J., McAnulla, C., McDowall, J., Mistry, J., Mitchell, A., Mulder, N., Natale, D., Orengo, C., Quinn, A.F., Selengut, J.D., Sigrist, C.J., Thimma, M., Thomas, P.D., Valentin, F., Wilson, D., Wu, C.H., and Yeats, C. 2009. InterPro: The integrative protein signature database. Nucleic Acids Res. 37:D211‐D215.
	Inbar, Y., Benyamini, H., Nussinov, R., and Wolfson, H.J. 2003. Protein structure prediction via combinatorial assembly of sub‐structural units. Bioinformatics 19:i158‐i168.
	Ishida, T. and Kinoshita, K. 2008. Prediction of disordered regions in proteins based on the meta approach. Bioinformatics 24:1344‐1348.
	Islam, S.A., Luo, J., and Sternberg, M.J. 1995. Identification and analysis of domains in proteins. Protein Engin. 8:513‐525.
	Jones, D.T. 2007. Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics 23:538‐544.
	Kaminska, K.H., Baraniak, U., Boniecki, M., Nowaczyk, K., Czerwoniec, A., and Bujnicki, J.M. 2008. Structural bioinformatics analysis of enzymes involved in the biosynthesis pathway of the hypermodified nucleoside ms(2)io(6)A37 in tRNA. Proteins 70:1‐18.
	Kelley, L.A. and Sternberg, M.J.E. 2009. Protein structure prediction on the Web: A case study using the Phyre server. Nat. Protoc. 4:363‐371.
	Kobe, B., Guss, M., and Huber, T. 2008. Structural Proteomics: High‐Throughput Methods, 1st ed., Humana Press, Totowa, N.J.
	Krishnamurthy, N. and Sjölander, K. V. 2005. Basic protein sequence analysis. Curr. Protoc. Protein Sci. 41:2.11.1‐2.11.24.
	Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E.L. 2001. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 305:567‐580.
	Kurowski, M.A. and Bujnicki, J.M. 2003. GeneSilico protein structure prediction meta‐server. Nucleic Acids Res. 31:3305‐3307.
	Letunic, I., Doerks, T., and Bork, P. 2009. SMART 6: Recent updates and new developments. Nucleic Acids Res. 37:D229‐D232.
	Levitt, M. and Chothia, C. 1976. Structural patterns in globular proteins. Nature 261:552‐558.
	Liu, J. and Rost, B. 2003. Domains, motifs and clusters in the protein universe. Curr. Opin. Chem. Biol. 7:5‐11.
	Lupas, A., Van Dyke, M., and Stock, J. 1991. Predicting coiled coils from protein sequences. Science 252:1162‐1164.
	Magrane, M. and Consortium, U. 2011. UniProt Knowledgebase: A hub of integrated protein data. Database 2011:bar009.
	Majumdar, I., Kinch, L.N., and Grishin, N.V. 2009. A database of domain definitions for proteins with complex interdomain geometry. PLoS One 4:e5084.
	Marchler‐Bauer, A., Anderson, J.B., Chitsaz, F., Derbyshire, M.K., DeWeese‐Scott, C., Fong, J.H., Geer, L.Y., Geer, R.C., Gonzales, N.R., Gwadz, M., He, S., Hurwitz, D.I., Jackson, J.D., Ke, Z., Lanczycki, C.J., Liebert, C.A., Liu, C., Lu, F., Lu, S., Marchler, G.H., Mullokandov, M., Song, J.S., Tasneem, A., Thanki, N., Yamashita, R.A., Zhang, D., Zhang, N., and Bryant, S.H. 2009. CDD: Specific functional annotation with the Conserved Domain Database. Nucleic Acids Res. 37:D205‐D210.
	Mi, H., Dong, Q., Muruganujan, A., Gaudet, P., Lewis, S., and Thomas, P.D. 2010. PANTHER version 7: Improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium. Nucleic Acids Res. 38:D204‐D210.
	Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247:536‐540.
	Nikolskaya, A.N., Arighi, C.N., Huang, H., Barker, W.C., and Wu, C.H. 2006. PIRSF family classification system for protein functional and evolutionary analysis. Evol. Bioinform. Online 2:197‐209.
	Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., and Thornton, J.M. 1997. CATH—a hierarchic classification of protein domain structures. Structure 5:1093‐1108.
	Peng, K., Radivojac, P., Vucetic, S., Dunker, A.K., and Obradovic, Z. 2006. Length‐dependent prediction of protein intrinsic disorder. BMC Bioinform. 7:208.
	Petsko, G.A. 2006. An introduction to modeling structure from sequence. Curr. Protoc. Bioinform. 15:5.1.1‐5.1.3.
	Richardson, J.S. 1981. The anatomy and taxonomy of protein structure. Adv. Protein Chem. 34:167‐339.
	Roy, A., Kucukural, A., and Zhang, Y. 2010. I‐TASSER: A unified platform for automated protein structure and function prediction. Nat. Protoc. 5:725‐738.
	Saini, H.K. and Fischer, D. 2005. Meta‐DP: Domain prediction meta‐server. Bioinformatics 21:2917‐2920.
	Sanchez‐Pulido, L., Valencia, A., and Rojas, A.M. 2007. Are promyelocytic leukaemia protein nuclear bodies a scaffold for caspase‐2 programmed cell death? Trends Biochem. Sci. 32:400‐406.
	Servant, F., Bru, C., Carrère, S., Courcelle, E., Gouzy, J., Peyruc, D., and Kahn, D. 2002. ProDom: Automated clustering of homologous domains. Briefings Bioinform. 3:246‐251.
	Shimizu, K., Hirose, S., and Noguchi, T. 2007. POODLE‐S: Web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position‐specific scoring matrix. Bioinformatics 23:2337‐2338.
	Shiozawa, K., Maita, N., Tomii, K., Seto, A., Goda, N., Akiyama, Y., Shimizu, T., Shirakawa, M., and Hiroaki, H. 2004. Structure of the N‐terminal domain of PEX1 AAA‐ATPase. Characterization of a putative adaptor‐binding domain. J. Biol. Chem. 279:50,060‐50,068.
	Sigrist, C.J.A., Cerutti, L., de Castro, E., Langendijk‐Genevaux, P.S., Bulliard, V., Bairoch, A., and Hulo, N. 2010. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 38:D161‐D166.
	Söding, J. 2005. Protein homology detection by HMM‐HMM comparison. Bioinformatics 21:951‐960.
	Stormo, G.D. 2011. An introduction to recognizing functional domains. Curr. Protoc. Bioinform. 34:2.1.1‐2.1.6.
	Suyama, M. and Ohara, O. 2003. DomCut: Prediction of inter‐domain linker regions in amino acid sequences. Bioinformatics 19:673‐674.
	Tai, C.‐H., Lee, W.J., Vincent, J.J., and Lee, B. 2005. Evaluation of domain prediction in CASP6. Proteins 61:S183‐S192.
	Tatusov, R.L., Galperin, M.Y., Natale, D.A., and Koonin, E.V. 2000. The COG database: A tool for genome‐scale analysis of protein functions and evolution. Nucleic Acids Res. 28:33‐36.
	Taylor, W.R. 1999. Protein structural domain identification. Protein Engin. 12:203‐216.
	Terashi, G., Takeda‐Shitaka, M., Kanou, K., Iwadate, M., Takaya, D., Hosoi, A., Ohta, K., and Umeyama, H. 2007. Fams‐ace: A combined method to select the best model after remodeling all server models. Proteins 69:S98‐S107.
	Tovchigrechko, A. and Vakser, I.A. 2006. GRAMM‐X public web server for protein‐protein docking. Nucleic Acids Res. 34:W310‐W314.
	Tress, M., Cheng, J., Baldi, P., Joo, K., Lee, J., Seo, J.H., Lee, J., Baker, D., Chivian, D., Kim, D., and Ezkurdia, I. 2007. Assessment of predictions submitted for the CASP7 domain prediction category. Proteins 69:S137‐S151.
	Veretnik, S., Bourne, P.E., Alexandrov, N.N., and Shindyalov, I.N. 2004. Toward consistent assignment of structural domains in proteins. J. Mol. Biol. 339:647‐678.
	Wallner, B. and Elofsson, A. 2005. Pcons5: Combining consensus, structural evaluation and fold recognition scores. Bioinformatics 21:4248‐4254.
	Ward, J.J., McGuffin, L.J., Bryson, K., Buxton, B.F., and Jones, D.T. 2004. The DISOPRED server for the prediction of protein disorder. Bioinformatics 20:2138‐2139.
	Wheelan, S.J., Marchler‐Bauer, A., and Bryant, S H. 2000. Domain size distributions can predict domain boundaries. Bioinformatics 16:613‐618.
	Wilson, D., Pethica, R., Zhou, Y., Talbot, C., Vogel, C., Madera, M., Chothia, C., and Gough, J. 2009. SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 37:D380‐D386.
	Wolf, E., Kim, P.S., and Berger, B. 1997. MultiCoil: A program for predicting two‐ and three‐stranded coiled coils. Protein Sci. 6:1179‐1189.
	Xu, D. and Xu, Y. 2000. Protein tertiary structure prediction. Curr. Protoc. Protein Sci. 19:2.7.1‐2.7.17.
	Yeats, C., Lees, J., Reid, A., Kellam, P., Martin, N., Liu, X., and Orengo, C. 2008. Gene3D: Comprehensive structural and functional annotation of genomes. Nucleic Acids Res. 36:D414‐D418.

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Protein Structural Domains: Definition and Prediction

Abstract

Table of Contents

Materials

Figures

Videos

Literature Cited