Basic Protein Sequence Analysis
互联网
- Abstract
- Table of Contents
- Materials
- Figures
- Literature Cited
Abstract
Prediction of molecular function of proteins has become an important task in the genomics era. A wide variety of sequence analysis tools are available to biologists for this task. We have selected one or two primary protocols for tasks such as domain detection, subcellular localization, and motif detection. We also present a strategy for integration of results from different protocols. All the resources needed for these protocols are accessible via publicly available Web servers and databases and require little or no computational expertise.
Keywords: protein sequence analysis; domain detection; subcellular localization; motif detection
Table of Contents
- Basic Protocol 1: Identifying Structural and Functional Domains Using Integrated Meta‐Servers
- Support Protocol 1: Guidelines for Understanding Results of Analyses from Integrated Meta‐Servers
- Alternate Protocol 1: Identifying Structural and Functional Domains Using the NCBI CD‐Search
- Support Protocol 2: Guidelines for Understanding Results of Analyses from the NCBI CD‐Search
- Alternate Protocol 2: Predicting Structural Domains and Secondary Structure Using 3D‐PSSM
- Support Protocol 3: Guidelines for Understanding Results of Analyses from the 3D‐PSSM Server
- Basic Protocol 2: Predicting Helical Transmembrane Regions and Subcellular Localization
- Support Protocol 4: Guidelines for Understanding Results of Predictions of Helical Transmembrane Regions and Subcellular Localization
- Alternate Protocol 3: Predicting the Subcellular Localization of a Protein Using Targetp
- Support Protocol 5: Guidelines for Understanding Results Predicting the Subcellular Localization of a Protein Using TargetP
- Basic Protocol 3: Predicting Key Functional Residues and Motifs using the Prosite Web Server
- Support Protocol 6: Guidelines for Understanding Results of Searches Done Using the Prosite Web Server
- Support Protocol 7: Homolog Identification
- Commentary
- Literature Cited
- Figures
- Tables
Materials
Basic Protocol 1: Identifying Structural and Functional Domains Using Integrated Meta‐Servers
Materials
Support Protocol 1: Guidelines for Understanding Results of Analyses from Integrated Meta‐Servers
Materials
Alternate Protocol 1: Identifying Structural and Functional Domains Using the NCBI CD‐Search
Materials
Support Protocol 2: Guidelines for Understanding Results of Analyses from the NCBI CD‐Search
Materials
Alternate Protocol 2: Predicting Structural Domains and Secondary Structure Using 3D‐PSSM
Materials
Support Protocol 3: Guidelines for Understanding Results of Analyses from the 3D‐PSSM Server
Materials
|
Figures
-
Figure 2.11.1 FASTA format of AVR2_HUMAN protein sequence. The Swiss‐Prot accession number is P27037 and the Swiss‐Prot ID is AVR2_HUMAN. View Image -
Figure 2.11.2 SMART sequence submission form. (A ) Submission of sequence in FASTA format. (B ) Submission using the Swiss‐Prot accession number in the box marked with arrow. View Image -
Figure 2.11.3 Sequence submission options in SMART. View Image -
Figure 2.11.4 SMART results for AVR2_HUMAN. Integrated results from several servers show an N‐terminal signal peptide, two PFAM domains, a transmembrane domain, and a low‐complexity region. View Image -
Figure 2.11.5 CD‐Search sequence submission form. View Image -
Figure 2.11.6 Results from CD‐Search. (A ) The output gives a graphic display and E‐values of hits. (B ) Pairwise alignment of the query with the top hit. View Image -
Figure 2.11.7 Multiple sequence alignment of AVR2_HUMAN kinase domain with a conserved kinase domain hit. The columns marked with # have been identified as critical for kinase function. View Image -
Figure 2.11.8 Submission form and results page from 3D‐PSSM server. (A ) The submission form; (B ) the results page. View Image -
Figure 2.11.9 Sequence alignments produced by 3D‐PSSM (A ) Alignment of query to its homologs. (B ) Pairwise alignment and comparison of secondary structure and solvent accessibility predictions for the query with that of the hit (structural domain). View Image -
Figure 2.11.10 TMHMM ouput for AVR2_HUMAN. Shown are results in the “extensive with graphics” format. A summary of results is followed by a graphic display. The x axis on the graph represents amino acid positions in the query sequence and the y axis represents the probability of a residue to be in a TM. The peaks indicate positions with higher probability of being a TM domain. The predictions of TM and topology are indicated above a y axis value of 1. The region predicted to be inside (cytoplasmic) is represented with a blue line (marked as INSIDE in this figure) and the region on the outside (extracellular) is represented in pink (labeled OUTSIDE in this figure). View Image -
Figure 2.11.11 TargetP prediction of the subcellular localization of AVR2_HUMAN. View Image -
Figure 2.11.12 Prediction of patterns by PROSITE for AVR2_HUMAN. (A ) results of profile search; (B ) results of pattern search. View Image
Videos
Literature Cited
Literature Cited | |
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410. | |
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths‐Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30:276‐280. | |
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O'Donovan, C., Phan, I., Pilbout, S., and Schneider, M. 2003. The SWISS‐PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31:365‐370. | |
Chen, C.P., Kernytsky, A., and Rost, B. 2002. Transmembrane helix predictions revisited. Protein Sci. 11:2774‐2791. | |
Eisen, J.A. 1998. Phylogenomics: Improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res. 8:163‐167. | |
Emanuelsson, O. and von Heijne, G. 2001. Prediction of organellar targeting signals. Biochim. Biophys. Acta 1541:114‐119. | |
Geer, L.Y., Domrachev, M., Lipman, D.J., and Bryant, S.H. 2002. CDART: Protein homology by domain architecture. Genome Res. 12:1619‐1623. | |
Hubbard, T., Barker, D., Birney, E., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., Down, T., Durbin, R., Eyras, E., Gilbert, J., Hammond, M., Huminiecki, L., Kasprzyk, A., Lehvaslaiho, H., Lijnzaad, P., Melsopp, C., Mongin, E., Pettett, R., Pocock, M., Potter, S., Rust, A., Schmidt, E., Searle, S., Slater, G., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Stupka, E., Ureta‐Vidal, A., Vastrik, I., and Clamp, M. 2002. The Ensembl genome database project. Nucleic Acids Res. 30:38‐41. | |
Hulo, N., Sigrist, C.J., Le Saux, V., Langendijk‐Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P., and Bairoch, A. 2004. Recent improvements to the PROSITE database. Nucleic Acids Res. 32:D134‐D137. | |
Jones, D.T. 1999. Protein secondary structure prediction based on position‐specific scoring matrices. J. Mol. Biol. 292:195‐202. | |
Kelley, L.A., MacCallum, R.M., and Sternberg, M.J. 2000. Enhanced genome annotation using structural profiles in the program 3D‐PSSM. J. Mol. Biol. 299:499‐520. | |
Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E.L. 2001. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 305:567‐580. | |
Letunic, I., Copley, R.R., Schmidt, S., Ciccarelli, F.D., Doerks, T., Schultz, J., Ponting, C.P., and Bork, P. 2004. SMART 4.0: Towards genomic data integration. Nucleic Acids Res. 32:D142‐D144. | |
Marchler‐Bauer, A., Anderson, J.B., DeWeese‐Scott, C., Fedorova, N.D., Geer, L.Y., He, S., Hurwitz, D.I., Jackson, J.D., Jacobs, A.R., Lanczycki, C.J., Liebert, C.A., Liu, C., Madej, T., Marchler, G.H., Mazumder, R., Nikolskaya, A.N., Panchenko, A.R., Rao, B.S., Shoemaker, B.A., Simonyan, V., Song, J.S., Thiessen, P.A., Vasudevan, S., Wang, Y., Yamashita, R.A., Yin, J.J., and Bryant, S.H. 2003. CDD: A curated Entrez database of conserved domain alignments. Nucleic Acids Res. 31:383‐387. | |
Marchler‐Bauer, A. and Bryant, S.H. 2004. CD‐Search: Protein domain annotations on the fly. Nucleic Acids Res. 32:W327‐W331. | |
Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247:536‐540. | |
Schatz, G. and Dobberstein, B. 1996. Common principles of protein translocation across membranes. Science 271:1519‐1526. | |
Schultz, J., Milpetz, F., Bork, P., and Ponting, C.P. 1998. SMART, a simple modular architecture research tool: Identification of signaling domains. Proc. Natl. Acad. Sci. U.S.A. 95:5857‐5864. | |
Sigrist, C.J., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M., Bairoch, A., and Bucher, P. 2002. PROSITE: A documented database using patterns and profiles as motif descriptors. Brief. Bioinform. 3:265‐274. | |
Sjölander, K. 2004. Phylogenomic inference of protein molecular function: Advances and challenges. Bioinformatics 20:170‐179. | |
Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D., and Koonin, E.V. 2001. The COG database: New developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29:22‐28. |