Basic Protein Sequence Analysis

互联网2013-12-31

1035

Abstract
Table of Contents
Materials
Figures
Literature Cited

Abstract

Prediction of molecular function of proteins has become an important task in the genomics era. A wide variety of sequence analysis tools are available to biologists for this task. We have selected one or two primary protocols for tasks such as domain detection, subcellular localization, and motif detection. We also present a strategy for integration of results from different protocols. All the resources needed for these protocols are accessible via publicly available Web servers and databases and require little or no computational expertise.

Keywords: protein sequence analysis; domain detection; subcellular localization; motif detection

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Basic Protocol 1: Identifying Structural and Functional Domains Using Integrated Meta‐Servers
Support Protocol 1: Guidelines for Understanding Results of Analyses from Integrated Meta‐Servers
Alternate Protocol 1: Identifying Structural and Functional Domains Using the NCBI CD‐Search
Support Protocol 2: Guidelines for Understanding Results of Analyses from the NCBI CD‐Search
Alternate Protocol 2: Predicting Structural Domains and Secondary Structure Using 3D‐PSSM
Support Protocol 3: Guidelines for Understanding Results of Analyses from the 3D‐PSSM Server
Basic Protocol 2: Predicting Helical Transmembrane Regions and Subcellular Localization
Support Protocol 4: Guidelines for Understanding Results of Predictions of Helical Transmembrane Regions and Subcellular Localization
Alternate Protocol 3: Predicting the Subcellular Localization of a Protein Using Targetp
Support Protocol 5: Guidelines for Understanding Results Predicting the Subcellular Localization of a Protein Using TargetP
Basic Protocol 3: Predicting Key Functional Residues and Motifs using the Prosite Web Server
Support Protocol 6: Guidelines for Understanding Results of Searches Done Using the Prosite Web Server
Support Protocol 7: Homolog Identification
Commentary
Literature Cited
Figures
Tables

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Identifying Structural and Functional Domains Using Integrated Meta‐Servers

Materials

Support Protocol 1: Guidelines for Understanding Results of Analyses from Integrated Meta‐Servers

Materials

See protocol 1 .

Alternate Protocol 1: Identifying Structural and Functional Domains Using the NCBI CD‐Search

Materials

See protocol 1 .

Support Protocol 2: Guidelines for Understanding Results of Analyses from the NCBI CD‐Search

Materials

Alternate Protocol 2: Predicting Structural Domains and Secondary Structure Using 3D‐PSSM

Materials

See protocol 7 .

Support Protocol 3: Guidelines for Understanding Results of Analyses from the 3D‐PSSM Server

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 2.11.1 FASTA format of AVR2_HUMAN protein sequence. The Swiss‐Prot accession number is P27037 and the Swiss‐Prot ID is AVR2_HUMAN.

View Image

Figure 2.11.2 SMART sequence submission form. (A ) Submission of sequence in FASTA format. (B ) Submission using the Swiss‐Prot accession number in the box marked with arrow.

View Image

Figure 2.11.3 Sequence submission options in SMART.

View Image

Figure 2.11.4 SMART results for AVR2_HUMAN. Integrated results from several servers show an N‐terminal signal peptide, two PFAM domains, a transmembrane domain, and a low‐complexity region.

View Image

Figure 2.11.5 CD‐Search sequence submission form.

View Image

Figure 2.11.6 Results from CD‐Search. (A ) The output gives a graphic display and E‐values of hits. (B ) Pairwise alignment of the query with the top hit.

View Image

Figure 2.11.7 Multiple sequence alignment of AVR2_HUMAN kinase domain with a conserved kinase domain hit. The columns marked with # have been identified as critical for kinase function.

View Image

Figure 2.11.8 Submission form and results page from 3D‐PSSM server. (A ) The submission form; (B ) the results page.

View Image

Figure 2.11.9 Sequence alignments produced by 3D‐PSSM (A ) Alignment of query to its homologs. (B ) Pairwise alignment and comparison of secondary structure and solvent accessibility predictions for the query with that of the hit (structural domain).

View Image

Figure 2.11.10 TMHMM ouput for AVR2_HUMAN. Shown are results in the “extensive with graphics” format. A summary of results is followed by a graphic display. The x axis on the graph represents amino acid positions in the query sequence and the y axis represents the probability of a residue to be in a TM. The peaks indicate positions with higher probability of being a TM domain. The predictions of TM and topology are indicated above a y axis value of 1. The region predicted to be inside (cytoplasmic) is represented with a blue line (marked as INSIDE in this figure) and the region on the outside (extracellular) is represented in pink (labeled OUTSIDE in this figure).

View Image

Figure 2.11.11 TargetP prediction of the subcellular localization of AVR2_HUMAN.

View Image

Figure 2.11.12 Prediction of patterns by PROSITE for AVR2_HUMAN. (A ) results of profile search; (B ) results of pattern search.

View Image

Videos

Literature Cited

Literature Cited
	Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
	Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths‐Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30:276‐280.
	Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O'Donovan, C., Phan, I., Pilbout, S., and Schneider, M. 2003. The SWISS‐PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31:365‐370.
	Chen, C.P., Kernytsky, A., and Rost, B. 2002. Transmembrane helix predictions revisited. Protein Sci. 11:2774‐2791.
	Eisen, J.A. 1998. Phylogenomics: Improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res. 8:163‐167.
	Emanuelsson, O. and von Heijne, G. 2001. Prediction of organellar targeting signals. Biochim. Biophys. Acta 1541:114‐119.
	Geer, L.Y., Domrachev, M., Lipman, D.J., and Bryant, S.H. 2002. CDART: Protein homology by domain architecture. Genome Res. 12:1619‐1623.
	Hubbard, T., Barker, D., Birney, E., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., Down, T., Durbin, R., Eyras, E., Gilbert, J., Hammond, M., Huminiecki, L., Kasprzyk, A., Lehvaslaiho, H., Lijnzaad, P., Melsopp, C., Mongin, E., Pettett, R., Pocock, M., Potter, S., Rust, A., Schmidt, E., Searle, S., Slater, G., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Stupka, E., Ureta‐Vidal, A., Vastrik, I., and Clamp, M. 2002. The Ensembl genome database project. Nucleic Acids Res. 30:38‐41.
	Hulo, N., Sigrist, C.J., Le Saux, V., Langendijk‐Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P., and Bairoch, A. 2004. Recent improvements to the PROSITE database. Nucleic Acids Res. 32:D134‐D137.
	Jones, D.T. 1999. Protein secondary structure prediction based on position‐specific scoring matrices. J. Mol. Biol. 292:195‐202.
	Kelley, L.A., MacCallum, R.M., and Sternberg, M.J. 2000. Enhanced genome annotation using structural profiles in the program 3D‐PSSM. J. Mol. Biol. 299:499‐520.
	Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E.L. 2001. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 305:567‐580.
	Letunic, I., Copley, R.R., Schmidt, S., Ciccarelli, F.D., Doerks, T., Schultz, J., Ponting, C.P., and Bork, P. 2004. SMART 4.0: Towards genomic data integration. Nucleic Acids Res. 32:D142‐D144.
	Marchler‐Bauer, A., Anderson, J.B., DeWeese‐Scott, C., Fedorova, N.D., Geer, L.Y., He, S., Hurwitz, D.I., Jackson, J.D., Jacobs, A.R., Lanczycki, C.J., Liebert, C.A., Liu, C., Madej, T., Marchler, G.H., Mazumder, R., Nikolskaya, A.N., Panchenko, A.R., Rao, B.S., Shoemaker, B.A., Simonyan, V., Song, J.S., Thiessen, P.A., Vasudevan, S., Wang, Y., Yamashita, R.A., Yin, J.J., and Bryant, S.H. 2003. CDD: A curated Entrez database of conserved domain alignments. Nucleic Acids Res. 31:383‐387.
	Marchler‐Bauer, A. and Bryant, S.H. 2004. CD‐Search: Protein domain annotations on the fly. Nucleic Acids Res. 32:W327‐W331.
	Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247:536‐540.
	Schatz, G. and Dobberstein, B. 1996. Common principles of protein translocation across membranes. Science 271:1519‐1526.
	Schultz, J., Milpetz, F., Bork, P., and Ponting, C.P. 1998. SMART, a simple modular architecture research tool: Identification of signaling domains. Proc. Natl. Acad. Sci. U.S.A. 95:5857‐5864.
	Sigrist, C.J., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M., Bairoch, A., and Bucher, P. 2002. PROSITE: A documented database using patterns and profiles as motif descriptors. Brief. Bioinform. 3:265‐274.
	Sjölander, K. 2004. Phylogenomic inference of protein molecular function: Advances and challenges. Bioinformatics 20:170‐179.
	Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D., and Koonin, E.V. 2001. The COG database: New developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29:22‐28.