丁香实验_LOGO
登录
提问
我要登录
|免费注册
点赞
收藏
wx-share
分享

Basic Protein Sequence Analysis

互联网

979
  • Abstract
  • Table of Contents
  • Materials
  • Figures
  • Literature Cited

Abstract

 

Prediction of molecular function of proteins has become an important task in the genomics era. A wide variety of sequence analysis tools are available to biologists for this task. We have selected one or two primary protocols for tasks such as domain detection, subcellular localization, and motif detection. We also present a strategy for integration of results from different protocols. All the resources needed for these protocols are accessible via publicly available Web servers and databases and require little or no computational expertise.

Keywords: protein sequence analysis; domain detection; subcellular localization; motif detection

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Basic Protocol 1: Identifying Structural and Functional Domains Using Integrated Meta‐Servers
  • Support Protocol 1: Guidelines for Understanding Results of Analyses from Integrated Meta‐Servers
  • Alternate Protocol 1: Identifying Structural and Functional Domains Using the NCBI CD‐Search
  • Support Protocol 2: Guidelines for Understanding Results of Analyses from the NCBI CD‐Search
  • Alternate Protocol 2: Predicting Structural Domains and Secondary Structure Using 3D‐PSSM
  • Support Protocol 3: Guidelines for Understanding Results of Analyses from the 3D‐PSSM Server
  • Basic Protocol 2: Predicting Helical Transmembrane Regions and Subcellular Localization
  • Support Protocol 4: Guidelines for Understanding Results of Predictions of Helical Transmembrane Regions and Subcellular Localization
  • Alternate Protocol 3: Predicting the Subcellular Localization of a Protein Using Targetp
  • Support Protocol 5: Guidelines for Understanding Results Predicting the Subcellular Localization of a Protein Using TargetP
  • Basic Protocol 3: Predicting Key Functional Residues and Motifs using the Prosite Web Server
  • Support Protocol 6: Guidelines for Understanding Results of Searches Done Using the Prosite Web Server
  • Support Protocol 7: Homolog Identification
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Identifying Structural and Functional Domains Using Integrated Meta‐Servers

Materials

Support Protocol 1: Guidelines for Understanding Results of Analyses from Integrated Meta‐Servers

  Materials
  • See protocol 1 .

Alternate Protocol 1: Identifying Structural and Functional Domains Using the NCBI CD‐Search

  Materials
  • See protocol 1 .

Support Protocol 2: Guidelines for Understanding Results of Analyses from the NCBI CD‐Search

Materials

Alternate Protocol 2: Predicting Structural Domains and Secondary Structure Using 3D‐PSSM

  Materials
  • See protocol 7 .

Support Protocol 3: Guidelines for Understanding Results of Analyses from the 3D‐PSSM Server

Materials
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   Figure Figure 2.11.1 FASTA format of AVR2_HUMAN protein sequence. The Swiss‐Prot accession number is P27037 and the Swiss‐Prot ID is AVR2_HUMAN.
    View Image
  •   Figure Figure 2.11.2 SMART sequence submission form. (A ) Submission of sequence in FASTA format. (B ) Submission using the Swiss‐Prot accession number in the box marked with arrow.
    View Image
  •   Figure Figure 2.11.3 Sequence submission options in SMART.
    View Image
  •   Figure Figure 2.11.4 SMART results for AVR2_HUMAN. Integrated results from several servers show an N‐terminal signal peptide, two PFAM domains, a transmembrane domain, and a low‐complexity region.
    View Image
  •   Figure Figure 2.11.5 CD‐Search sequence submission form.
    View Image
  •   Figure Figure 2.11.6 Results from CD‐Search. (A ) The output gives a graphic display and E‐values of hits. (B ) Pairwise alignment of the query with the top hit.
    View Image
  •   Figure Figure 2.11.7 Multiple sequence alignment of AVR2_HUMAN kinase domain with a conserved kinase domain hit. The columns marked with # have been identified as critical for kinase function.
    View Image
  •   Figure Figure 2.11.8 Submission form and results page from 3D‐PSSM server. (A ) The submission form; (B ) the results page.
    View Image
  •   Figure Figure 2.11.9 Sequence alignments produced by 3D‐PSSM (A ) Alignment of query to its homologs. (B ) Pairwise alignment and comparison of secondary structure and solvent accessibility predictions for the query with that of the hit (structural domain).
    View Image
  •   Figure Figure 2.11.10 TMHMM ouput for AVR2_HUMAN. Shown are results in the “extensive with graphics” format. A summary of results is followed by a graphic display. The x axis on the graph represents amino acid positions in the query sequence and the y axis represents the probability of a residue to be in a TM. The peaks indicate positions with higher probability of being a TM domain. The predictions of TM and topology are indicated above a y axis value of 1. The region predicted to be inside (cytoplasmic) is represented with a blue line (marked as INSIDE in this figure) and the region on the outside (extracellular) is represented in pink (labeled OUTSIDE in this figure).
    View Image
  •   Figure Figure 2.11.11 TargetP prediction of the subcellular localization of AVR2_HUMAN.
    View Image
  •   Figure Figure 2.11.12 Prediction of patterns by PROSITE for AVR2_HUMAN. (A ) results of profile search; (B ) results of pattern search.
    View Image

Videos

Literature Cited

Literature Cited
   Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403‐410.
   Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths‐Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30:276‐280.
   Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O'Donovan, C., Phan, I., Pilbout, S., and Schneider, M. 2003. The SWISS‐PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31:365‐370.
   Chen, C.P., Kernytsky, A., and Rost, B. 2002. Transmembrane helix predictions revisited. Protein Sci. 11:2774‐2791.
   Eisen, J.A. 1998. Phylogenomics: Improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res. 8:163‐167.
   Emanuelsson, O. and von Heijne, G. 2001. Prediction of organellar targeting signals. Biochim. Biophys. Acta 1541:114‐119.
   Geer, L.Y., Domrachev, M., Lipman, D.J., and Bryant, S.H. 2002. CDART: Protein homology by domain architecture. Genome Res. 12:1619‐1623.
   Hubbard, T., Barker, D., Birney, E., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., Down, T., Durbin, R., Eyras, E., Gilbert, J., Hammond, M., Huminiecki, L., Kasprzyk, A., Lehvaslaiho, H., Lijnzaad, P., Melsopp, C., Mongin, E., Pettett, R., Pocock, M., Potter, S., Rust, A., Schmidt, E., Searle, S., Slater, G., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Stupka, E., Ureta‐Vidal, A., Vastrik, I., and Clamp, M. 2002. The Ensembl genome database project. Nucleic Acids Res. 30:38‐41.
   Hulo, N., Sigrist, C.J., Le Saux, V., Langendijk‐Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P., and Bairoch, A. 2004. Recent improvements to the PROSITE database. Nucleic Acids Res. 32:D134‐D137.
   Jones, D.T. 1999. Protein secondary structure prediction based on position‐specific scoring matrices. J. Mol. Biol. 292:195‐202.
   Kelley, L.A., MacCallum, R.M., and Sternberg, M.J. 2000. Enhanced genome annotation using structural profiles in the program 3D‐PSSM. J. Mol. Biol. 299:499‐520.
   Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E.L. 2001. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 305:567‐580.
   Letunic, I., Copley, R.R., Schmidt, S., Ciccarelli, F.D., Doerks, T., Schultz, J., Ponting, C.P., and Bork, P. 2004. SMART 4.0: Towards genomic data integration. Nucleic Acids Res. 32:D142‐D144.
   Marchler‐Bauer, A., Anderson, J.B., DeWeese‐Scott, C., Fedorova, N.D., Geer, L.Y., He, S., Hurwitz, D.I., Jackson, J.D., Jacobs, A.R., Lanczycki, C.J., Liebert, C.A., Liu, C., Madej, T., Marchler, G.H., Mazumder, R., Nikolskaya, A.N., Panchenko, A.R., Rao, B.S., Shoemaker, B.A., Simonyan, V., Song, J.S., Thiessen, P.A., Vasudevan, S., Wang, Y., Yamashita, R.A., Yin, J.J., and Bryant, S.H. 2003. CDD: A curated Entrez database of conserved domain alignments. Nucleic Acids Res. 31:383‐387.
   Marchler‐Bauer, A. and Bryant, S.H. 2004. CD‐Search: Protein domain annotations on the fly. Nucleic Acids Res. 32:W327‐W331.
   Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247:536‐540.
   Schatz, G. and Dobberstein, B. 1996. Common principles of protein translocation across membranes. Science 271:1519‐1526.
   Schultz, J., Milpetz, F., Bork, P., and Ponting, C.P. 1998. SMART, a simple modular architecture research tool: Identification of signaling domains. Proc. Natl. Acad. Sci. U.S.A. 95:5857‐5864.
   Sigrist, C.J., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M., Bairoch, A., and Bucher, P. 2002. PROSITE: A documented database using patterns and profiles as motif descriptors. Brief. Bioinform. 3:265‐274.
   Sjölander, K. 2004. Phylogenomic inference of protein molecular function: Advances and challenges. Bioinformatics 20:170‐179.
   Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D., and Koonin, E.V. 2001. The COG database: New developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29:22‐28.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library
 
提问
扫一扫
丁香实验小程序二维码
实验小助手
丁香实验公众号二维码
关注公众号
反馈
TOP
打开小程序